nandini57 commented on issue #1582:
URL: https://github.com/apache/incubator-hudi/issues/1582#issuecomment-622970820


   My apologies. Let me try to explain.If i don't upsert the data with each 
batch where applicable,when i query back the table,it will have duplicates as 
batch  "n" need to have data from batch  "n-1","n-2" ... I need to do group by 
upsertKey ..max(commit_time) to get the latest view of data.Doing a group by 
with each read won't scale .
   
   Instead of this, if i can preserve the current_val with deleted identifier 
in CustomPayload and also return both incoming and current payload in Combine 
And Get, i can preserve the required data for audit and also read can filter 
out records with deleted identifier.
   
   Does this make sense.Any other ideas? Possibly ,making the copyOldRecord a 
configurable property with default as false if that doesn't impact anything else
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to