VitoMakarevich opened a new pull request, #11450:
URL: https://github.com/apache/hudi/pull/11450

   If I have `"spark.hadoop.parquet.avro.write-old-list-structure", "false"` 
explicitly set - to being able to write nulls inside arrays(the only way), Hudi 
starts to write Parquets with the following schema inside:
   ```
      required group internal_list (LIST) {
       repeated group list {
         required int64 element;
       }
     }
   ```
   But if I had some files produced before setting 
`"spark.hadoop.parquet.avro.write-old-list-structure", "false"`, they have the 
following schema inside 
   ```
     required group internal_list (LIST) {
       repeated int64 array;
     }
   ```
   And Hudi 0.14.x at least fails to read records from such file - failing with 
exception 
   `Caused by: java.lang.RuntimeException: Null-value for required field: `
   
   Even though the contents of arrays is `not null`(it cannot be null in fact 
since Avro requires `spark.hadoop.parquet.avro.write-old-list-structure` = 
`false` to write `null`s.
   
   ### Expected behavior 
   Taken from Hudi 0.12.1(not sure what exactly broke that):
   1. If I have a file with 2 level structure and update(not matter having 
nulls inside array or not - both produce the same) arrives with 
"spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it 
into 3 level.(**fails in 0.14.1**)
   2. If I have 3 level structure with nulls and update cames(not matter with 
nulls or without) - read and write correctly
   
   The simple reproduction of issue can be found here:
   https://github.com/VitoMakarevich/hudi-issue-014
   
   Highly likely, the problem appeared after Hudi made some changes, so values 
from Hadoop conf started to propagate into Reader instance(likely they were not 
propagated before).
   
   ### Change Logs
   Added explicit override of 
`spark.hadoop.parquet.avro.write-old-list-structure` = `true` if file being 
read is old(has 2 level structure).
   
   
   ### Impact
   
   Running tests to ensure no unexpected issues propagating.
   
   ### Risk level (write none, low medium or high below)
   
   medium 
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
     ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to