[
https://issues.apache.org/jira/browse/FLINK-32700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17747830#comment-17747830
]
Manan Mangal commented on FLINK-32700:
--------------------------------------
Good point! We can add a new configuration to allow users to determine if they
want to drain their job. The flink REST endpoint supports this drain
functionality:
[https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-stop]
The 'advanceToEndOfTime' parameter in the stopWithSavepoint method can be set
based on the configured value.
Related issue that can be addressed is that sometimes the job may hit timeout
while creating savepoint during cancel, which causes the job to be blocked at
the deletion stage until manually intervened. For this case, we can call the
cancel method in the TimeoutException catch block, so that we proceed with
stopping the job in the case of timeout error.
If this solution looks correct, I can work on a PR for these changes.
> Support job drain for Savepoint upgrade mode jobs in Flink Operator
> -------------------------------------------------------------------
>
> Key: FLINK-32700
> URL: https://issues.apache.org/jira/browse/FLINK-32700
> Project: Flink
> Issue Type: Improvement
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.5.0
> Reporter: Manan Mangal
> Priority: Major
>
> During cancel job with savepoint upgrade mode, jobs can be allowed to drain
> by advancing the watermark to the end, before they are stopped, so that the
> in-flight data is not lost.
> If the job fails to drain and hits timeout or any other error, it can be
> cancelled without taking a savepoint.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)