bhasudha commented on a change in pull request #4235:
URL: https://github.com/apache/hudi/pull/4235#discussion_r764452519



##########
File path: website/docs/hoodie_deltastreamer.md
##########
@@ -210,7 +210,137 @@ A deltastreamer job can then be triggered as follows:
 
 Read more in depth about concurrency control in the [concurrency control 
concepts](/docs/concurrency_control) section
 
+## Checkpointing
+HoodieDeltaStreamer uses checkpoints to keep track of what data has been read 
already so it can resume without needing to reprocess all data.
+When using a Kafka source, the checkpoint is the [Kafka 
Offset](https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management) 
+When using a DFS source, the checkpoint is the 'last modified' timestamp of 
the latest file read.
+Checkpoints are saved in the .hoodie commit file as 
`deltastreamer.checkpoint.key`.
+
+If you need to change the checkpoints for reprocessing or replaying data you 
can use the following options:
+
+- `--checkpoint` will overwrite the current commit file checkpoint.

Review comment:
       @kywe665  providing --checkpoint (which is an override of the last known 
checkpoint from commits) produces the  `deltastreamer.checkpoint.reset_key` int 
he .commit file indicating the override. The value of --checkpoint is passed to 
the property `deltastreamer.checkpoint.reset_key` on a successful overwrite. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to