nsivabalan commented on pull request #2438:
URL: https://github.com/apache/hudi/pull/2438#issuecomment-853866564


   good point. 
   Tell me if my understanding is right in general wrt usage of timestamp based 
checkpointing. 
   user would like to use timestamp based checkpointing in deltastreamer only 
for bootstrap case. 
   and further on, checkpointing will be using the regular kafka checkpoint 
format of "topicName,0:123,1:456". 
   
   if my understanding (stated above) is true, essentially, within 
kafkaOffsenGen, we might have to parse checkpoint as timestamp for first 
time(bootstrap), but from 2nd time, we fallback to regular checkpoint parsing 
mechanism. 
   
   I see we have InitialCheckPointProvider. Let me think about how to go about 
this and will get back to you. For now, this is what I can think of. 
   InitialCheckpointProvider will expose getCheckpointType() method. 
   and we add it as a property to configs if initialCheckpointProvider is set 
around 
[here](https://github.com/apache/hudi/blob/f6eee77636223077cfd2ce516f1b8805dfa6e35e/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L132).
 
   Within readFromSource in DeltaSync(), if checkpoint is fetched from commit 
metadata, we may not honor this checkpoint type. or we will clear the 
checkpoint type property if set. 
   but if fetched from cfg.checkPoint, we will leave the property as is and let 
kafkaOffsetGen handle checkpoint parsing. 
   
   But let me think through this more. But in the mean time, if you can confirm 
my understanding of the usage of timestamp based checkpointing, would be great. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to