YannByron commented on PR #5436:
URL: https://github.com/apache/hudi/pull/5436#issuecomment-1145952554

   @vinothchandar 
   > Actually what I proposed, everything uses CDC blocks. Just that when we 
are deriving on-the-fly we don't write before and after into the CDC blocks
   
   in this case, do you mean that  only `op` and `_hoodie_record_key` will be 
kept in the cdc block?  then iterator over this cdc block, and get the 
after-image value and the inserted value from the new file (base file or log 
file), get the before-image value and the deleted value from the previous file 
slice. 
   if so, IMO, the cdc blocks in this case can be omitted. Because we can 
iterator the log file or the base file (apply the filter `_hoodie_commit_time` 
= the current commit time), and continue the next operations.
   
   > everything uses CDC blocks.
   in my design that cdc block have the while cdc information, the cdc block 
will be written out only when the `HoodieMergeHandle` is called, not always. 
And other scenarios can re-use the existing files.
   be afraid there is still a gap about this, so i stress this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to