[ 
https://issues.apache.org/jira/browse/FLINK-32881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755576#comment-17755576
 ] 

Renxiang Zhou commented on FLINK-32881:
---------------------------------------

Hi, [~masteryhx]

I am very glad to contribute my code, so could you please assign this ticket to 
me, thank you~

> Client supports making savepoints in detach mode
> ------------------------------------------------
>
>                 Key: FLINK-32881
>                 URL: https://issues.apache.org/jira/browse/FLINK-32881
>             Project: Flink
>          Issue Type: Improvement
>          Components: Client / Job Submission, Runtime / Checkpointing
>    Affects Versions: 1.19.0
>            Reporter: Renxiang Zhou
>            Priority: Major
>              Labels: detach-savepoint
>             Fix For: 1.19.0
>
>         Attachments: image-2023-08-16-17-14-34-740.png, 
> image-2023-08-16-17-14-44-212.png
>
>
> When triggering a savepoint using the command-line tool, the client needs to 
> wait for the job to finish creating the savepoint before it can exit. For 
> jobs with large state, the savepoint creation process can be time-consuming, 
> leading to the following problems:
>  # Platform users may need to manage thousands of Flink tasks on a single 
> client machine. With the current savepoint triggering mode, all savepoint 
> creation threads on that machine have to wait for the job to finish the 
> snapshot, resulting in significant resource waste;
>  # If the savepoint producing time exceeds the client's timeout duration, the 
> client will throw a timeout exception and report that the triggering 
> savepoint process fails. Since different jobs have varying savepoint 
> durations, it is difficult to adjust the timeout parameter on the client side.
> Therefore, we propose adding a detach mode to trigger savepoints on the 
> client side, just similar to the detach mode behavior when submitting jobs. 
> Here are some specific details:
>  # The savepoint UUID will be generated on the client side. After 
> successfully triggering the savepoint, the client immediately returns the 
> UUID information and exits.
>  # Add a "dump-pending-savepoints" API that allows the client to check 
> whether the triggered savepoint has been successfully created.
> By implementing these changes, the client can detach from the savepoint 
> creation process, reducing resource waste, and providing a way to check the 
> status of savepoint creation.
> !image-2023-08-16-17-14-34-740.png|width=2129,height=625!!image-2023-08-16-17-14-44-212.png|width=1530,height=445!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to