[GitHub] [hudi] lewyh commented on issue #7145: [SUPPORT] `org.apache.avro.SchemaParseException: Can't redefine: array` when an Array containing a Struct is the only field in a Struct

GitBox Sun, 06 Nov 2022 04:23:10 -0800


lewyh commented on issue #7145:
URL: https://github.com/apache/hudi/issues/7145#issuecomment-1304789269


   Thanks, I can confirm that adding a dummy field to the struct avoids this 
issue. However, two things to note:
   
   - The dummy field needs to exist when the table is created - if you try 
adding the dummy field via schema evolution then the same error persists, since 
it's caused by the reading of the existing data. So anyone creating a table 
with this kind of structure needs to be aware of this issue _before_ they 
create their table.
   - I need to set 
`config("spark.hadoop.parquet.avro.write-old-list-structure", False)` to 
resolve a separate issue of arrays containing NULL values. When this config is 
set, then the above fix does not work. Regardless of the presence of a dummy 
field, the error appears, only instead of `Can't redefine: array`, it reads 
`Can't redefine: list`. I believe this means that Hudi is unusable for users 
that need to support NULL values in arrays, and have Structs within Arrays 
within Structs.
   
   Perhaps this is worth a note on the Hudi docs? The comprehensive schema 
evolution documentation is what originally attracted us to Hudi, so a warning 
about these situations might help others avoid this pitfall.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] lewyh commented on issue #7145: [SUPPORT] `org.apache.avro.SchemaParseException: Can't redefine: array` when an Array containing a Struct is the only field in a Struct

Reply via email to