kazdy commented on issue #5873:
URL: https://github.com/apache/hudi/issues/5873#issuecomment-1157669265

   @xiarixiaoyao I was hoping that with schema reconciliation "default values 
will be injected to missing fields" as per the docs:
   
   > When a new batch of write has records with old schema, but latest table 
schema got evolved, this config will upgrade the records to leverage latest 
table schema(default values will be injected to missing fields). If not, the 
write batch would fail.
   
   The scenario I described does not happen when I have a missing column but no 
new column in the same batch. Then Hudi injects null to the missing column and 
the column is not removed from the table in metastore.
   
   The behavior I'm looking for is like this:
   incoming data doesn’t contain every column in the table –> those columns 
will simply be assigned null/default values
   This is what other similar frameworks allow users to do, so I guess Hudi can 
do the same possibly as an option guarded by a config if someone prefers to 
enforce schema more strictly.
   
   I also found a comment from another Hudi issue, that makes me think that my 
scenario should work:
   @TarunMootala can you upgrade Hudi to 0.10.1. this can reconcile the schema 
wherever the new field is put in. Spark-SQL is still having some problems that 
the new middle field can't be shown.
   But I test in mater branch, all of the problems above have gone.
   
   _Originally posted by @YannByron in 
https://github.com/apache/hudi/issues/4914#issuecomment-1063623677_
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to