nsivabalan commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-853866564
good point. Tell me if my understanding is right in general wrt usage of timestamp based checkpointing. user would like to use timestamp based checkpointing in deltastreamer only for bootstrap case. and further on, checkpointing will be using the regular kafka checkpoint format of "topicName,0:123,1:456". if my understanding (stated above) is true, essentially, within kafkaOffsenGen, we might have to parse checkpoint as timestamp for first time(bootstrap), but from 2nd time, we fallback to regular checkpoint parsing mechanism. I see we have InitialCheckPointProvider. Let me think about how to go about this and will get back to you. For now, this is what I can think of. InitialCheckpointProvider will expose getCheckpointType() method. and we add it as a property to configs if initialCheckpointProvider is set around [here](https://github.com/apache/hudi/blob/f6eee77636223077cfd2ce516f1b8805dfa6e35e/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L132). Within readFromSource in DeltaSync(), if checkpoint is fetched from commit metadata, we may not honor this checkpoint type. or we will clear the checkpoint type property if set. but if fetched from cfg.checkPoint, we will leave the property as is and let kafkaOffsetGen handle checkpoint parsing. But let me think through this more. But in the mean time, if you can confirm my understanding of the usage of timestamp based checkpointing, would be great. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org