[jira] [Commented] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator

Manan Mangal (Jira) Thu, 27 Jul 2023 01:02:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747830#comment-17747830
 ]


Manan Mangal commented on FLINK-32700:
--------------------------------------

Good point! We can add a new configuration to allow users to determine if they 
want to drain their job. The flink REST endpoint supports this drain 
functionality:

[https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-stop]

The 'advanceToEndOfTime' parameter in the stopWithSavepoint method can be set 
based on the configured value.

 

Related issue that can be addressed is that sometimes the job may hit timeout 
while creating savepoint during cancel, which causes the job to be blocked at 
the deletion stage until manually intervened. For this case, we can call the 
cancel method in the TimeoutException catch block, so that we proceed with 
stopping the job in the case of timeout error.

 

If this solution looks correct, I can work on a PR for these changes.

> Support job drain for Savepoint upgrade mode jobs in Flink Operator
> -------------------------------------------------------------------
>
>                 Key: FLINK-32700
>                 URL: https://issues.apache.org/jira/browse/FLINK-32700
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.5.0
>            Reporter: Manan Mangal
>            Priority: Major
>
> During cancel job with savepoint upgrade mode, jobs can be allowed to drain 
> by advancing the watermark to the end, before they are stopped, so that the 
> in-flight data is not lost. 
> If the job fails to drain and hits timeout or any other error, it can be 
> cancelled without taking a savepoint.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32700) Support job drain for Savepoint upgrade mode jobs in Flink Operator

Reply via email to