[GitHub] [hudi] gtwuser commented on issue #2265: Arrays with nulls in them result in broken parquet files

GitBox Fri, 27 May 2022 21:19:08 -0700


gtwuser commented on issue #2265:
URL: https://github.com/apache/hudi/issues/2265#issuecomment-1140167813


   > I also encountered this issue, in my case upgrading to 0.9.0 and setting 
`parquet.avro.write-old-list-structure` to false helped both for MOR and COW 
table. When using 0.8.0 issue still persists.
   > 
   > In my case I had two problems:
   > 
   > 1. Schema evolution in complex type column from `array<struct<col1>>` to 
`array<struct<col1, col2>>`, when reading col2 should be null for the first 
record, hudi was failing to read table.
   > 2. While doing upserts, the scenario was like this: do initial load with  
`array<struct<col1>>`, upsert records with exact same data. Hudi was creating 
avro file with different type for this column: `array<string>`. It was failing 
at avro parquet schema conversion, hive sync was failing also because of the 
schema change.
   
   @kazdy can you please share the environment details which you used while 
fixing this issue, since we are facing still facing it  wanted to confirm if 
its due to the configurations. Have raised an issue #5701 as well. 
   
   **Environment Description**
   
   AWS glue 3.0
   
   Hudi version : 0.10.1
   
   Spark version : 3.1.2
   
   Running on Docker? (yes/no) : no, we are running glue jobs using pyspark
   
   We even tried downgrading to 0.9.0 but still we got the same error as 
mentioned in the #5701 .
   **JARs used**:
   
   1. httpclient-4.5.13.jar,
   2. hudi-spark3-bundle_2.12-0.9.0.jar,
   3. spark-avro_2.12-3.1.1.jar,
   4. libfb303-0.9.3.jar,
   5. calcite-core-1.16.0.jar
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] gtwuser commented on issue #2265: Arrays with nulls in them result in broken parquet files

Reply via email to