nsivabalan commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-864413719


   Guess we can simplify things. Let me go over some pseudo code of interest. 
   
   within DeltaSync.read()
   ```
   // set right checkpoint value 
   if(cfg.checkpoint != null && ! 
(commitMetadata.contains(Checkpoint_RESET_Key) ) {
      checkpoint = cfg.checkpoint;
   } else if (commitMetadata.contains(Checkpoint_Key)) {
       checkpoint = commitMetadata.get(Checkpoint_Key));
   } else {
       Option.empty() 
   }
   ```
   // Note that first if condition deals with RESET_key where as 2nd else if 
conditions deals with Checkpoint_key
   
   within write() 
   ```
   // towards the end
   commitMetadata.out(Checkpoint_Key, updated checkpoint after writing)
   if(cfg.checkpoint != null) {
     commitMetadata.add(Checkpoint_RESET_Key);
   }
   ```
   
   If cfg.checkpoint is set, only during first round, it will be honored. At 
the end of first batch, we add Checkpoint_RESET_Key to the commitmetadata and 
hence from subsequent batches, checkpoint will be parsed from commitMetadata. 
   
   With this PR, only addition is that we are introducing a new checkpoint 
type. Let me propose a simple add on to above code that would work for us. 
   
   within DeltaSync.read()
   ```
   // set right checkpoint value 
   boolean resetCheckpointType = true;
   if(cfg.checkpoint != null && ! 
(commitMetadata.contains(Checkpoint_RESET_Key) ) {
      checkpoint = cfg.checkpoint;
      resetCheckpointType = false;
   } else if (commitMetadata.contains(Checkpoint_Key)) {
       checkpoint = commitMetadata.get(Checkpoint_Key));
   } else {
       Option.empty() 
   }
   if (resetCheckpointType) {
     **reset checkpoint type if set.** 
   }
   ```
   
   No other changes are required. This is based of the assumption that 
Checkpoint_RESET_Key and checkpoint type goes hand in hand. During first batch, 
checkpoint type could be set, there won't be any Checkpoint_RESET_Key set. But 
from 2nd batch, it should be reverse. check point type should not be set, but 
Checkpoint_RESET_Key should be part of the commit metadata. Given this 
assumption, we don't really need to add checkpoint type to commitMetadata, but 
still decide whether to use the checkpoint type or not. 
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to