[jira] [Commented] (FLINK-3800) ExecutionGraphs can become orphans

ASF GitHub Bot (JIRA) Thu, 23 Jun 2016 05:04:34 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346322#comment-15346322
 ]


ASF GitHub Bot commented on FLINK-3800:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2096#discussion_r68222088
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
 ---
    @@ -1029,21 +1061,25 @@ else if (current == JobStatus.CANCELLING) {
                                                }
                                        }
                                        else if (current == JobStatus.FAILING) {
    -                                           if 
(restartStrategy.canRestart() && transitionState(current, 
JobStatus.RESTARTING)) {
    -                                                   // double check in case 
that in the meantime a SuppressRestartsException was thrown
    -                                                   if 
(restartStrategy.canRestart()) {
    -                                                           
restartStrategy.restart(this);
    -                                                           break;
    -                                                   } else {
    -                                                           fail(new 
Exception("ExecutionGraph went into RESTARTING state but " +
    -                                                                   "then 
the restart strategy was disabled."));
    -                                                   }
    -
    -                                           } else if 
(!restartStrategy.canRestart() && transitionState(current, JobStatus.FAILED, 
failureCause)) {
    +                                           boolean allowRestart = 
!(failureCause instanceof SuppressRestartsException);
    +
    +                                           if (allowRestart && 
restartStrategy.canRestart() && transitionState(current, JobStatus.RESTARTING)) 
{
    +                                                   
restartStrategy.restart(this);
    +                                                   break;
    +                                           } else if ((!allowRestart || 
!restartStrategy.canRestart()) && transitionState(current, JobStatus.FAILED, 
failureCause)) {
                                                        postRunCleanup();
                                                        break;
                                                }
                                        }
    +                                   else if (current == 
JobStatus.SUSPENDED) {
    +                                           // we've already cleaned up 
when entering the SUSPENDED state
    +                                           break;
    +                                   }
    +                                   else if 
(current.isGloballyTerminalState()) {
    +                                           LOG.warn("Job has entered 
globally terminal state without waiting for all " +
    +                                                   "job vertices to reach 
final state.");
    +                                           break;
    +                                   }
                                        else {
                                                fail(new 
Exception("ExecutionGraph went into final state from state " + current));
    --- End diff --
    
    Yes you're right. The `break` is missing here. Will add it.


> ExecutionGraphs can become orphans
> ----------------------------------
>
>                 Key: FLINK-3800
>                 URL: https://issues.apache.org/jira/browse/FLINK-3800
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> The {{JobManager.cancelAndClearEverything}} method fails all currently 
> executed jobs on the {{JobManager}} and then clears the list of 
> {{currentJobs}} kept in the JobManager. This can become problematic if the 
> user has set a restart strategy for a job, because the {{RestartStrategy}} 
> will try to restart the job. This can lead to unwanted re-deployments of the 
> job which consumes resources and thus will trouble the execution of other 
> jobs. If the restart strategy never stops, then this prevents that the 
> {{ExecutionGraph}} from ever being properly terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3800) ExecutionGraphs can become orphans

Reply via email to