[
https://issues.apache.org/jira/browse/HUDI-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yanjia Gary Li updated HUDI-644:
--------------------------------
Status: Open (was: New)
> Enable to retrieve checkpoint from previous commits in Delta Streamer
> ---------------------------------------------------------------------
>
> Key: HUDI-644
> URL: https://issues.apache.org/jira/browse/HUDI-644
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Components: DeltaStreamer
> Reporter: Yanjia Gary Li
> Assignee: Yanjia Gary Li
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This ticket is to resolve the following problem:
> The user is using a homebrew Spark data source to read new data and write to
> Hudi table
> The user would like to migrate to Delta Streamer
> But the Delta Streamer only checks the last commit metadata, if there is no
> checkpoint info, then the Delta Streamer will use the default. For Kafka
> source, it is LATEST.
> The user would like to run the homebrew Spark data source reader and Delta
> Streamer in parallel to prevent data loss, but the Spark data source writer
> will make commit without checkpoint info, which will reset the delta
> streamer.
> So if we have an option to allow the user to retrieve the checkpoint from
> previous commits instead of the latest commit would be helpful for the
> migration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)