[
https://issues.apache.org/jira/browse/FLINK-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267995#comment-17267995
]
Theo Diefenthal commented on FLINK-21029:
-----------------------------------------
Hi [~trohrmann] , good question indeed. When I created the bug ticket I
thought, it's obvious: "When I want to stop a job, ultimately it should stop. I
don't care if there is an exception or not", which is also what I'm used to
with the Kafka consumer: The consumers often throw some "InterruptedException"
when shutting down a job, but I learned to ignore them.
On the other hand: If I can't create my savepoint due to some reason, but
restoring the job would bring it to running where I potentially can stop nicely
with a savepoint again, it's probably better to ignore the stop command and
restart the job.
I didn't find anything regarding this behavior (If stop fails, job tries to
restart) in the doc: Maybe it's sufficient to add this in the docs and state
"It's desired behavior" and close the bug here? Or we need to dig down furhter
and say: On this exception do that and on that exception do something else?
Having a wrong savepoint path should IMHO stop the job: There is no chance to
ever stop it gracefully if the path is wrong (At least if it points to an
address not under you control like an S3 bucket from someone else), so stopping
it directly hurts less than keeping it running and let it eventually fail in
future.
> Failure of shutdown lead to restart of (connected) pipeline
> -----------------------------------------------------------
>
> Key: FLINK-21029
> URL: https://issues.apache.org/jira/browse/FLINK-21029
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.11.2
> Reporter: Theo Diefenthal
> Priority: Major
> Fix For: 1.13.0, 1.11.4, 1.12.2
>
>
> This bug happened in combination with
> https://issues.apache.org/jira/browse/FLINK-21028 .
> When I wanted to stop a job via CLI "flink stop..." with disjoint job graph
> (independent pipelines in the graph), one task wan't able to stop properly
> (Reported in mentioned bug). This lead to restarting the job. I think, this
> is a wrong behavior in general and a separated bug:
> If any crash occurs on (trying) to stop a job, Flink shouldn't try to restart
> but continue stopping the job.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)