vicuna96 commented on issue #5942: URL: https://github.com/apache/hudi/issues/5942#issuecomment-1164752309
Case1: Hi @nsivabalan, yes we mean the latter explanation: a record is coming in which will update _col1_ and _col3_, but _col2_ has been populated before by a different source and thus there is a non-default value in hdfs. Hence we want to update _col1_ and _col3_ from this source, but if there was a value from _col2_ in hdfs, we would like to keep that value so that the record written is `incremental.col1, coalesce(incremental.col2, hdfs.col2), incremental.col3`. If you notice on the dataframes shown before and after for case 1, the record corresponding to key1 is correctly updated with incoming columns, and the value for numberField is also correctly taken from hdfs (as 55). So it seems to us that this "partial update" is working when the record doesn't move from one partition to another, as evidenced by the update to record with _key1_. However, for _key3_ the partition column is to be updated, and that is where the non-default numberField from hdfs gets nullified (numberField goes from 77 to null for this record after the update). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
