vicuna96 commented on issue #5942:
URL: https://github.com/apache/hudi/issues/5942#issuecomment-1164752309

   Case1: Hi @nsivabalan, yes we mean the latter explanation: a record is 
coming in which will update _col1_ and _col3_, but _col2_ has been populated 
before by a different source and thus there is a non-default value in hdfs. 
Hence we want to update _col1_ and _col3_ from this source, but if there was a 
value from _col2_ in hdfs, we would like to keep that value so that the record 
written is `incremental.col1, coalesce(incremental.col2, hdfs.col2), 
incremental.col3`. If you notice on the dataframes shown before and after for 
case 1, the record corresponding to key1 is correctly updated with incoming 
columns, and the value for numberField is also correctly taken from hdfs (as 
55).
   
   So it seems to us that this "partial update" is working when the record 
doesn't move from one partition to another, as evidenced by the update to 
record with _key1_. However, for _key3_ the partition column is to be updated, 
and that is where the non-default numberField from hdfs gets nullified 
(numberField goes from 77 to null for this record after the update).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to