vinothchandar commented on issue #1362: HUDI-644 Enable user to get checkpoint from previous commits in DeltaStreamer URL: https://github.com/apache/incubator-hudi/pull/1362#issuecomment-596362404 Okay. caught up now.. Firstly, writing in parallel using two jobs is a dangerous thing as Hudi does not support such multi writer access. I would advise against it (although you could hack it to work per se if you tried enough).. @garyli1019 we can definitely add tooling to generate checkpoints in the format that DeltaStreamer expects.. But, I would like to decouple that from the delta streamer itself.. I favor, keeping it simple and just a single knob for the user wanting to override the checkpoint.. There is already an option to override the checkpoint I believe.. ``` /** * Resume Delta Streamer from this checkpoint. */ @Parameter(names = {"--checkpoint"}, description = "Resume Delta Streamer from this checkpoint.") public String checkpoint = null; ``` >> I need a robust way to generate the checkpoint from kafka-connect-hdfs managed files and kafka-connect itself sometimes having an issue to retrieve checkpoint when the Kafka partition number was large Would like to understand this more in general .. For DFS sources, all you need is a timestamp right? And for Kafka, you need to call `consumer.offsetForTimes()` and get a bunch of offsets to override from
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
