[ 
https://issues.apache.org/jira/browse/HUDI-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanjia Gary Li updated HUDI-644:
--------------------------------
    Description: 
This ticket is to resolve the following problem:

The user has finished the initial load and write to Hudi table

The user would like to migrate to Delta Streamer

The user needs a tool to provide the checkpoint for the Delta Streamer in the 
first run.

  was:
This ticket is to resolve the following problem:

The user is using a homebrew Spark data source to read new data and write to 
Hudi table

The user would like to migrate to Delta Streamer

But the Delta Streamer only checks the last commit metadata, if there is no 
checkpoint info, then the Delta Streamer will use the default. For Kafka 
source, it is LATEST. 

The user would like to run the homebrew Spark data source reader and Delta 
Streamer in parallel to prevent data loss, but the Spark data source writer 
will make commit without checkpoint info, which will reset the delta streamer. 

So if we have an option to allow the user to retrieve the checkpoint from 
previous commits instead of the latest commit would be helpful for the 
migration. 


> checkpoint generator tool for delta streamer
> --------------------------------------------
>
>                 Key: HUDI-644
>                 URL: https://issues.apache.org/jira/browse/HUDI-644
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Yanjia Gary Li
>            Assignee: Yanjia Gary Li
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.6.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This ticket is to resolve the following problem:
> The user has finished the initial load and write to Hudi table
> The user would like to migrate to Delta Streamer
> The user needs a tool to provide the checkpoint for the Delta Streamer in the 
> first run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to