[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

Feifan Wang (Jira) Sat, 02 Oct 2021 07:16:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423539#comment-17423539
 ]


Feifan Wang commented on FLINK-9465:
------------------------------------

Hi [~trohrmann], thanks for reply, I think we can add the "savepoint-timeout" 
parameter in the following four places:

REST API :
 * 
[/jobs/:jobid/savepoints|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/#jobs-jobid-savepoints]
 * 
[/jobs/:jobid/stop|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/rest_api/#jobs-jobid-stop]

Command-Line Interface :
 * [Creating a 
Savepoint|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#creating-a-savepoint]
 * [Stopping a Job Gracefully Creating a Final 
Savepoint|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#stopping-a-job-gracefully-creating-a-final-savepoint]

 

 

BTW, I noticed that there are different styles of parameter formats in rest api 
and cli, some are in camel case, and others are in kebab case. Should we use a 
uniform format ?

> Specify a separate savepoint timeout option via CLI
> ---------------------------------------------------
>
>                 Key: FLINK-9465
>                 URL: https://issues.apache.org/jira/browse/FLINK-9465
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Truong Duc Kien
>            Assignee: Feifan Wang
>            Priority: Minor
>              Labels: auto-deprioritized-major, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Savepoint can take much longer time to perform than checkpoint, especially 
> with incremental checkpoint enabled. This leads to a couple of troubles:
>  * For our job, we currently have to set the checkpoint timeout much large 
> than necessary, otherwise we would be unable to perform savepoint. 
>  * During rush hour, our cluster would encounter high rate of checkpoint 
> timeout due to backpressure, however we're unable to migrate to a larger 
> configuration, because savepoint also timeout.
> In my opinion, the timeout for savepoint should be configurable separately, 
> both in the config file and as parameter to the savepoint command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-9465) Specify a separate savepoint timeout option via CLI

Reply via email to