vinothchandar commented on PR #5436:
URL: https://github.com/apache/hudi/pull/5436#issuecomment-1150688635

   Caught up on the discussion. 
   
   HoodieCreateHandle is directly used by the insert part of COW and MOR. The 
compactor uses MergeHandle, not CreateHandle and we can pass a flag in that 
path, to make it skip writing any CDC flags.
    
   <img width="631" alt="image" 
src="https://user-images.githubusercontent.com/1179324/172764081-bb3d0d5a-d268-48f4-b84a-197867b943fd.png";>
   
   
   In our discussions, we are talking about knowledge of inserts and updates, 
based on the write handle used. But, when we do the query, we don't know 
anything. So we need to design this even more generically at the file 
group/slice level. 
   
   
   My proposal here 
https://github.com/apache/hudi/pull/5436#issuecomment-1143251715 was different 
from what you wrote - @danny0405 .  
    
   ```
   1. for create handle, we deduce the _op directly from the record because 
they are all INSERTs
   2. for merge handle, we can deduce the before image on the fly when reading 
by comparing two 
       different versions of file slice and there is no need to write another 
cdc block.
   ```
   For 1 - we don't know during querying whether this file slice was produced 
by create handle or merge handle. So for a create handle with N records, we 
write a CDC log block with N entries, each with  `{op=I, before=null}`. 
   
   For merge handle, it can be updates/deletes and even inserts. For 
update/delete, their before can be in the base file of current slice, or on a 
base/log of previous slice. There are two options.  
   1. if `hoodie.cdc.supplemental.logging=true`, we perform the read to obtain 
all before images for update and delete record and write a CDC block with one 
entry for each `{op=U or D, before=value_read_from_base_or_previous_slice}`
   2. if `hoodie.cdc.supplemental.logging=false`, then we don't calculate the 
before now and just encode `{op=U or D, before=null or some sentinel}`. During 
read, if op = U or D and there is no before image, we go ahead/read and compute 
`before`. the `after` image can be read normally from the file slice for `I` 
and `U`
   
   Hope that helps.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to