danny0405 commented on pull request #2666:
URL: https://github.com/apache/hudi/pull/2666#issuecomment-818504260


   > sorry for late turn around on reviewing this. We should definitely get 
this before next release.
   > 
   > I am yet to review tests. but few high level thoughts on reviewing source 
code.
   > 
   > * Shouldn't we check schema compatibility? what incase new incoming batch 
is not compatible w/ table schema w/ partial updates set to true? did we cover 
this scenario.
   > * I see we have added support only for COW. should we throw exception if 
the config is set for MOR?
   > * I don't have a good idea of adding sql DML support to hoodie table. But 
if feasible, once such support is added, do you think we can leverage this w/o 
duplicating the work for sql DML. for eg, "update col1 = 'new_york' where col2= 
'123'" Such partial updates should translate from sql layer to this right.
   > * In tests, do verify that schema in commit metadata refers to table 
schema and not incoming partial schema.
   
   I have the same feeling, we should still use the old schema with full fields 
there, for new records with partial values, we can patch them up with a builtin 
placeholder values, and when we pre_combine the old and new, if we encounter 
the placeholder values, use the value from the existing record.
   
   In any case, to be consistent with SQL, please do not modify the schema 
which mess the thing up.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to