[ 
https://issues.apache.org/jira/browse/FLINK-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157542#comment-15157542
 ] 

ASF GitHub Bot commented on FLINK-3443:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1669#discussion_r53678441
  
    --- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
    @@ -1487,7 +1487,7 @@ class JobManager(
               }
             }
     
    -        eg.fail(cause)
    +        eg.cancel()
    --- End diff --
    
    The problem is that `postStop()` is not only called when the cluster is 
cleanly shut down (and you want job state removed) but in some cases on master 
failure, where we need to be sure that state is not removed from ZooKeeper.
    
    The `Integer.MAX_VALUE` issue is less critical...


> JobManager cancel and clear everything fails jobs instead of cancelling
> -----------------------------------------------------------------------
>
>                 Key: FLINK-3443
>                 URL: https://issues.apache.org/jira/browse/FLINK-3443
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> When the job manager is shut down, it calls {{cancelAndClearEverything}}. 
> This method does not {{cancel}} the {{ExecutionGraph}} instances, but 
> {{fail}}s them, which can lead to {{ExecutionGraph}} restart.
> I've noticed this in tests, where old graph got into a loop of restarts.
> What I don't understand is why the futures etc. are not cancelled when the 
> executor service is shut down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to