[
https://issues.apache.org/jira/browse/HUDI-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346537#comment-17346537
]
Nishith Agarwal commented on HUDI-1910:
---------------------------------------
Some background information on current checkpointing
HoodieDeltaStreamer takes the new source checkpoint (eg. Kafka) using
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/KafkaOffsetGen.java]
and adds it to the commit metadata here ->
[https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L461]
Possible approaches to implement this
# Keep the commit metadata checkpoint as is, but also commit this checkpoint
to Kafka.
# Use Kafka based checkpoint instead of commit metadata based
(2) Requires refactoring since there are no pluggable checkpoint mechanisms
right now in hudi.
(1) Can be done fairly easily, by adding a callback post commit operation like
this ->
[https://github.com/apache/hudi/blob/fcedbfcb58109f6af0b0a1e69915b40a0fcd5ccb/hudi-utilities/src/main/java/org/apache/hudi/utilities/callback/kafka/HoodieWriteCommitKafkaCallback.java#L47]
and then enhancing the
[https://github.com/apache/hudi/blob/fcedbfcb58109f6af0b0a1e69915b40a0fcd5ccb/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/callback/common/HoodieWriteCommitCallbackMessage.java#L25]
to contain extra information just the commit metadata.
Based on the requirements, either (1) or (2) can be implemented.
> Supporting Kafka based checkpointing for HoodieDeltaStreamer
> ------------------------------------------------------------
>
> Key: HUDI-1910
> URL: https://issues.apache.org/jira/browse/HUDI-1910
> Project: Apache Hudi
> Issue Type: Improvement
> Components: DeltaStreamer
> Reporter: Nishith Agarwal
> Assignee: Nishith Agarwal
> Priority: Major
>
> HoodieDeltaStreamer currently supports commit metadata based checkpoint. Some
> users have requested support for Kafka based checkpoints for freshness
> auditing purposes. This ticket tracks any implementation for that.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)