Yanjia Gary Li created HUDI-644:
-----------------------------------
Summary: Enable to retrieve checkpoint from previous commits in
Delta Streamer
Key: HUDI-644
URL: https://issues.apache.org/jira/browse/HUDI-644
Project: Apache Hudi (incubating)
Issue Type: Improvement
Components: DeltaStreamer
Reporter: Yanjia Gary Li
Assignee: Yanjia Gary Li
This ticket is to resolve the following problem:
The user is using a homebrew Spark data source to read new data and write to
Hudi table
The user would like to migrate to Delta Streamer
But the Delta Streamer only checks the last commit metadata, if there is no
checkpoint info, then the Delta Streamer will use the default. For Kafka
source, it is LATEST.
The user would like to run the homebrew Spark data source reader and Delta
Streamer in parallel to prevent data loss, but the Spark data source writer
will make commit without checkpoint info, which will reset the delta streamer.
So if we have an option to allow the user to retrieve the checkpoint from
previous commits instead of the latest commit would be helpful for the
migration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)