[ https://issues.apache.org/jira/browse/HUDI-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yanjia Gary Li updated HUDI-644: -------------------------------- Summary: checkpoint generator tool for delta streamer (was: Enable to retrieve checkpoint from previous commits in Delta Streamer) > checkpoint generator tool for delta streamer > -------------------------------------------- > > Key: HUDI-644 > URL: https://issues.apache.org/jira/browse/HUDI-644 > Project: Apache Hudi (incubating) > Issue Type: Improvement > Components: DeltaStreamer > Reporter: Yanjia Gary Li > Assignee: Yanjia Gary Li > Priority: Minor > Labels: pull-request-available > Fix For: 0.6.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket is to resolve the following problem: > The user is using a homebrew Spark data source to read new data and write to > Hudi table > The user would like to migrate to Delta Streamer > But the Delta Streamer only checks the last commit metadata, if there is no > checkpoint info, then the Delta Streamer will use the default. For Kafka > source, it is LATEST. > The user would like to run the homebrew Spark data source reader and Delta > Streamer in parallel to prevent data loss, but the Spark data source writer > will make commit without checkpoint info, which will reset the delta > streamer. > So if we have an option to allow the user to retrieve the checkpoint from > previous commits instead of the latest commit would be helpful for the > migration. -- This message was sent by Atlassian Jira (v8.3.4#803005)