vinothchandar commented on issue #1362: HUDI-644 Enable user to get checkpoint 
from previous commits in DeltaStreamer
URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-596362404
 
 
   Okay. caught up now.. 
   
   Firstly, writing in parallel using two jobs is a dangerous thing as Hudi 
does not support such multi writer access. I would advise against it (although 
you could hack it to work per se if you tried enough)..  
   
   @garyli1019 we can definitely add tooling to generate checkpoints in the 
format that DeltaStreamer expects..  But, I would like to decouple that from 
the delta streamer itself.. I favor, keeping it simple and just a single knob 
for the user wanting to override the checkpoint.. There is already an option to 
override the checkpoint I believe.. 
   ```
      /**
        * Resume Delta Streamer from this checkpoint.
        */
       @Parameter(names = {"--checkpoint"}, description = "Resume Delta 
Streamer from this checkpoint.")
       public String checkpoint = null;
   ```
   
   >> I need a robust way to generate the checkpoint from kafka-connect-hdfs 
managed files and kafka-connect itself sometimes having an issue to retrieve 
checkpoint when the Kafka partition number was large
   
   Would like to understand this more in general .. For DFS sources, all you 
need is a timestamp right? And for Kafka, you need to call 
`consumer.offsetForTimes()` and get a bunch of offsets to override from
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to