[ 
https://issues.apache.org/jira/browse/HUDI-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-644:
--------------------------------
    Summary: checkpoint generator tool for delta streamer  (was: Enable to 
retrieve checkpoint from previous commits in Delta Streamer)

> checkpoint generator tool for delta streamer
> --------------------------------------------
>
>                 Key: HUDI-644
>                 URL: https://issues.apache.org/jira/browse/HUDI-644
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Yanjia Gary Li
>            Assignee: Yanjia Gary Li
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This ticket is to resolve the following problem:
> The user is using a homebrew Spark data source to read new data and write to 
> Hudi table
> The user would like to migrate to Delta Streamer
> But the Delta Streamer only checks the last commit metadata, if there is no 
> checkpoint info, then the Delta Streamer will use the default. For Kafka 
> source, it is LATEST. 
> The user would like to run the homebrew Spark data source reader and Delta 
> Streamer in parallel to prevent data loss, but the Spark data source writer 
> will make commit without checkpoint info, which will reset the delta 
> streamer. 
> So if we have an option to allow the user to retrieve the checkpoint from 
> previous commits instead of the latest commit would be helpful for the 
> migration. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to