[
https://issues.apache.org/jira/browse/FLINK-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346322#comment-15346322
]
ASF GitHub Bot commented on FLINK-3800:
---------------------------------------
Github user tillrohrmann commented on a diff in the pull request:
https://github.com/apache/flink/pull/2096#discussion_r68222088
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionGraph.java
---
@@ -1029,21 +1061,25 @@ else if (current == JobStatus.CANCELLING) {
}
}
else if (current == JobStatus.FAILING) {
- if
(restartStrategy.canRestart() && transitionState(current,
JobStatus.RESTARTING)) {
- // double check in case
that in the meantime a SuppressRestartsException was thrown
- if
(restartStrategy.canRestart()) {
-
restartStrategy.restart(this);
- break;
- } else {
- fail(new
Exception("ExecutionGraph went into RESTARTING state but " +
- "then
the restart strategy was disabled."));
- }
-
- } else if
(!restartStrategy.canRestart() && transitionState(current, JobStatus.FAILED,
failureCause)) {
+ boolean allowRestart =
!(failureCause instanceof SuppressRestartsException);
+
+ if (allowRestart &&
restartStrategy.canRestart() && transitionState(current, JobStatus.RESTARTING))
{
+
restartStrategy.restart(this);
+ break;
+ } else if ((!allowRestart ||
!restartStrategy.canRestart()) && transitionState(current, JobStatus.FAILED,
failureCause)) {
postRunCleanup();
break;
}
}
+ else if (current ==
JobStatus.SUSPENDED) {
+ // we've already cleaned up
when entering the SUSPENDED state
+ break;
+ }
+ else if
(current.isGloballyTerminalState()) {
+ LOG.warn("Job has entered
globally terminal state without waiting for all " +
+ "job vertices to reach
final state.");
+ break;
+ }
else {
fail(new
Exception("ExecutionGraph went into final state from state " + current));
--- End diff --
Yes you're right. The `break` is missing here. Will add it.
> ExecutionGraphs can become orphans
> ----------------------------------
>
> Key: FLINK-3800
> URL: https://issues.apache.org/jira/browse/FLINK-3800
> Project: Flink
> Issue Type: Bug
> Components: JobManager
> Affects Versions: 1.0.0, 1.1.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
>
> The {{JobManager.cancelAndClearEverything}} method fails all currently
> executed jobs on the {{JobManager}} and then clears the list of
> {{currentJobs}} kept in the JobManager. This can become problematic if the
> user has set a restart strategy for a job, because the {{RestartStrategy}}
> will try to restart the job. This can lead to unwanted re-deployments of the
> job which consumes resources and thus will trouble the execution of other
> jobs. If the restart strategy never stops, then this prevents that the
> {{ExecutionGraph}} from ever being properly terminated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)