YannByron commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1145952554
@vinothchandar > Actually what I proposed, everything uses CDC blocks. Just that when we are deriving on-the-fly we don't write before and after into the CDC blocks in this case, do you mean that only `op` and `_hoodie_record_key` will be kept in the cdc block? then iterator over this cdc block, and get the after-image value and the inserted value from the new file (base file or log file), get the before-image value and the deleted value from the previous file slice. if so, IMO, the cdc blocks in this case can be omitted. Because we can iterator the log file or the base file (apply the filter `_hoodie_commit_time` = the current commit time), and continue the next operations. > everything uses CDC blocks. in my design that cdc block have the while cdc information, the cdc block will be written out only when the `HoodieMergeHandle` is called, not always. And other scenarios can re-use the existing files. be afraid there is still a gap about this, so i stress this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
