[GitHub] [hudi] parisni commented on issue #6558: [SUPPORT] Parquet/Avro schema mismatch: Avro field not found

GitBox Sat, 15 Oct 2022 01:10:55 -0700


parisni commented on issue #6558:
URL: https://github.com/apache/hudi/issues/6558#issuecomment-1279691815


   Thanks @alexeykudinkin this clarifies. can you confirm the avro schemas are 
coming from the last commit in the timeline ? 
   
   Also the reason our field changed case over time has been identified. The 
hive metastore is case insensitive, so when you populate it with upper case, it 
returns lower case. However when spark reads for the first time a metastore 
table, it infers the schema from the parquet files and feeds the metastore with 
properties which are case sensitive. Afterwards, spark reads those properties 
within the metastore. When properties and hive information diverge (in case of 
schema evolution) then spark fallback to only read the hive information and 
leave the properties. This leads to suddenly have lower case columns. Which 
ultimately breaks hudi has we got in this issue.
   
   Eventuallly, we had to recreate the table from scratch. Now we avoid to feed 
the properties by making spark read-only access to the metastore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] parisni commented on issue #6558: [SUPPORT] Parquet/Avro schema mismatch: Avro field not found

Reply via email to