[
https://issues.apache.org/jira/browse/FLINK-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267929#comment-17267929
]
Till Rohrmann commented on FLINK-21029:
---------------------------------------
Thanks for reporting this issue [~TheoD]. What would be the desired behaviour?
Would you expect Flink to retry the stop-with-savepoint if the operation
failed? How often would you allow it to retry in case that there is a
systematic problem (e.g. wrong savepoint path or directory which is full)?
Would you expect Flink to go then into a FAILED state if the
stop-with-savepoint operation cannot be completed?
At the moment, the contract is that Flink tries to do stop-with-savepoint and
if it fails, then it will report the failure back to the user and try to
recover from it.
> Failure of shutdown lead to restart of (connected) pipeline
> -----------------------------------------------------------
>
> Key: FLINK-21029
> URL: https://issues.apache.org/jira/browse/FLINK-21029
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.11.2
> Reporter: Theo Diefenthal
> Priority: Major
> Fix For: 1.13.0, 1.11.4, 1.12.2
>
>
> This bug happened in combination with
> https://issues.apache.org/jira/browse/FLINK-21028 .
> When I wanted to stop a job via CLI "flink stop..." with disjoint job graph
> (independent pipelines in the graph), one task wan't able to stop properly
> (Reported in mentioned bug). This lead to restarting the job. I think, this
> is a wrong behavior in general and a separated bug:
> If any crash occurs on (trying) to stop a job, Flink shouldn't try to restart
> but continue stopping the job.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)