Theo Diefenthal created FLINK-21030:
---------------------------------------

             Summary: Broken job restart for job with disjoint graph
                 Key: FLINK-21030
                 URL: https://issues.apache.org/jira/browse/FLINK-21030
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.11.2
            Reporter: Theo Diefenthal


Building on top of bugs:

https://issues.apache.org/jira/browse/FLINK-21028

 and https://issues.apache.org/jira/browse/FLINK-21029 : 

I tried to stop a Flink application on YARN via savepoint which didn't succeed 
due to a possible bug/racecondition in shutdown (Bug 21028). Due to some 
reason, Flink attempted to restart the pipeline after the failure in shutdown 
(21029). The bug here:

As I mentioned: My jobgraph is disjoint and the pipelines are fully isolated. 
Lets say the original error occured in a single task of pipeline1. Flink then 
restarted the entire pipeline1, but pipeline2 was shutdown successfully and 
switched the state to FINISHED.

My job thus was in kind of an invalid state after the attempt to stopping: One 
of two pipelines was running, the other was FINISHED. I guess this is kind of a 
bug in the restarting behavior that only all connected components of a graph 
are restarted, but the others aren't...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to