parisni commented on issue #6558: URL: https://github.com/apache/hudi/issues/6558#issuecomment-1279691815
Thanks @alexeykudinkin this clarifies. can you confirm the avro schemas are coming from the last commit in the timeline ? Also the reason our field changed case over time has been identified. The hive metastore is case insensitive, so when you populate it with upper case, it returns lower case. However when spark reads for the first time a metastore table, it infers the schema from the parquet files and feeds the metastore with properties which are case sensitive. Afterwards, spark reads those properties within the metastore. When properties and hive information diverge (in case of schema evolution) then spark fallback to only read the hive information and leave the properties. This leads to suddenly have lower case columns. Which ultimately breaks hudi has we got in this issue. Eventuallly, we had to recreate the table from scratch. Now we avoid to feed the properties by making spark read-only access to the metastore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
