[ 
https://issues.apache.org/jira/browse/FLINK-21029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267995#comment-17267995
 ] 

Theo Diefenthal commented on FLINK-21029:
-----------------------------------------

Hi [~trohrmann] , good question indeed. When I created the bug ticket I 
thought, it's obvious: "When I want to stop a job, ultimately it should stop. I 
don't care if there is an exception or not", which is also what I'm used to 
with the Kafka consumer: The consumers often throw some "InterruptedException" 
when shutting down a job, but I learned to ignore them.

On the other hand: If I can't create my savepoint due to some reason, but 
restoring the job would bring it to running where I potentially can stop nicely 
with a savepoint again, it's probably better to ignore the stop command and 
restart the job.

I didn't find anything regarding this behavior (If stop fails, job tries to 
restart) in the doc: Maybe it's sufficient to add this in the docs and state 
"It's desired behavior" and close the bug here? Or we need to dig down furhter 
and say: On this exception do that and on that exception do something else?

Having a wrong savepoint path should IMHO stop the job: There is no chance to 
ever stop it gracefully if the path is wrong (At least if it points to an 
address not under you control like an S3 bucket from someone else), so stopping 
it directly hurts less than keeping it running and let it eventually fail in 
future.

> Failure of shutdown lead to restart of (connected) pipeline
> -----------------------------------------------------------
>
>                 Key: FLINK-21029
>                 URL: https://issues.apache.org/jira/browse/FLINK-21029
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.2
>            Reporter: Theo Diefenthal
>            Priority: Major
>             Fix For: 1.13.0, 1.11.4, 1.12.2
>
>
> This bug happened in combination with 
> https://issues.apache.org/jira/browse/FLINK-21028 .
> When I wanted to stop a job via CLI "flink stop..." with disjoint job graph 
> (independent pipelines in the graph), one task wan't able to stop properly 
> (Reported in mentioned bug). This lead to restarting the job. I think, this 
> is a wrong behavior in general and a separated bug:
> If any crash occurs on (trying) to stop a job, Flink shouldn't try to restart 
> but continue stopping the job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to