lewyh commented on issue #7145:
URL: https://github.com/apache/hudi/issues/7145#issuecomment-1304789269
Thanks, I can confirm that adding a dummy field to the struct avoids this
issue. However, two things to note:
- The dummy field needs to exist when the table is created - if you try
adding the dummy field via schema evolution then the same error persists, since
it's caused by the reading of the existing data. So anyone creating a table
with this kind of structure needs to be aware of this issue _before_ they
create their table.
- I need to set
`config("spark.hadoop.parquet.avro.write-old-list-structure", False)` to
resolve a separate issue of arrays containing NULL values. When this config is
set, then the above fix does not work. Regardless of the presence of a dummy
field, the error appears, only instead of `Can't redefine: array`, it reads
`Can't redefine: list`. I believe this means that Hudi is unusable for users
that need to support NULL values in arrays, and have Structs within Arrays
within Structs.
Perhaps this is worth a note on the Hudi docs? The comprehensive schema
evolution documentation is what originally attracted us to Hudi, so a warning
about these situations might help others avoid this pitfall.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]