[
https://issues.apache.org/jira/browse/FLINK-32881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755763#comment-17755763
]
Hangxiang Yu commented on FLINK-32881:
--------------------------------------
[~zhourenxiang] Sure, assigned to you, please go ahead.
> Client supports making savepoints in detach mode
> ------------------------------------------------
>
> Key: FLINK-32881
> URL: https://issues.apache.org/jira/browse/FLINK-32881
> Project: Flink
> Issue Type: Improvement
> Components: Client / Job Submission, Runtime / Checkpointing
> Affects Versions: 1.19.0
> Reporter: Renxiang Zhou
> Assignee: Renxiang Zhou
> Priority: Major
> Labels: detach-savepoint
> Fix For: 1.19.0
>
> Attachments: image-2023-08-16-17-14-34-740.png,
> image-2023-08-16-17-14-44-212.png
>
>
> When triggering a savepoint using the command-line tool, the client needs to
> wait for the job to finish creating the savepoint before it can exit. For
> jobs with large state, the savepoint creation process can be time-consuming,
> leading to the following problems:
> # Platform users may need to manage thousands of Flink tasks on a single
> client machine. With the current savepoint triggering mode, all savepoint
> creation threads on that machine have to wait for the job to finish the
> snapshot, resulting in significant resource waste;
> # If the savepoint producing time exceeds the client's timeout duration, the
> client will throw a timeout exception and report that the triggering
> savepoint process fails. Since different jobs have varying savepoint
> durations, it is difficult to adjust the timeout parameter on the client side.
> Therefore, we propose adding a detach mode to trigger savepoints on the
> client side, just similar to the detach mode behavior when submitting jobs.
> Here are some specific details:
> # The savepoint UUID will be generated on the client side. After
> successfully triggering the savepoint, the client immediately returns the
> UUID information and exits.
> # Add a "dump-pending-savepoints" API that allows the client to check
> whether the triggered savepoint has been successfully created.
> By implementing these changes, the client can detach from the savepoint
> creation process, reducing resource waste, and providing a way to check the
> status of savepoint creation.
> !image-2023-08-16-17-14-34-740.png|width=2129,height=625!!image-2023-08-16-17-14-44-212.png|width=1530,height=445!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)