VitoMakarevich opened a new pull request, #11450:
URL: https://github.com/apache/hudi/pull/11450
If I have `"spark.hadoop.parquet.avro.write-old-list-structure", "false"`
explicitly set - to being able to write nulls inside arrays(the only way), Hudi
starts to write Parquets with the following schema inside:
```
required group internal_list (LIST) {
repeated group list {
required int64 element;
}
}
```
But if I had some files produced before setting
`"spark.hadoop.parquet.avro.write-old-list-structure", "false"`, they have the
following schema inside
```
required group internal_list (LIST) {
repeated int64 array;
}
```
And Hudi 0.14.x at least fails to read records from such file - failing with
exception
`Caused by: java.lang.RuntimeException: Null-value for required field: `
Even though the contents of arrays is `not null`(it cannot be null in fact
since Avro requires `spark.hadoop.parquet.avro.write-old-list-structure` =
`false` to write `null`s.
### Expected behavior
Taken from Hudi 0.12.1(not sure what exactly broke that):
1. If I have a file with 2 level structure and update(not matter having
nulls inside array or not - both produce the same) arrives with
"spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it
into 3 level.(**fails in 0.14.1**)
2. If I have 3 level structure with nulls and update cames(not matter with
nulls or without) - read and write correctly
The simple reproduction of issue can be found here:
https://github.com/VitoMakarevich/hudi-issue-014
Highly likely, the problem appeared after Hudi made some changes, so values
from Hadoop conf started to propagate into Reader instance(likely they were not
propagated before).
### Change Logs
Added explicit override of
`spark.hadoop.parquet.avro.write-old-list-structure` = `true` if file being
read is old(has 2 level structure).
### Impact
Running tests to ensure no unexpected issues propagating.
### Risk level (write none, low medium or high below)
medium
### Documentation Update
_Describe any necessary documentation update if there is any new feature,
config, or user-facing change. If not, put "none"._
- _The config description must be updated if new configs are added or the
default value of the configs are changed_
- _Any new feature or user-facing change requires updating the Hudi website.
Please create a Jira ticket, attach the
ticket number here and follow the
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to
make
changes to the website._
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]