[jira] [Commented] (FLINK-3800) ExecutionGraphs can become orphans

ASF GitHub Bot (JIRA) Thu, 23 Jun 2016 05:02:51 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346318#comment-15346318
 ]


ASF GitHub Bot commented on FLINK-3800:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2096#discussion_r68221836
  
    --- Diff: docs/internals/job_scheduling.md ---
    @@ -74,7 +74,28 @@ Besides the vertices, the ExecutionGraph also contains 
the {% gh_link /flink-run
     <img src="fig/job_and_execution_graph.svg" alt="JobGraph and 
ExecutionGraph" height="400px" style="text-align: center;"/>
     </div>
     
    -During its execution, each parallel task goes through multiple stages, 
from *created* to *finished* or *failed*. The diagram below illustrates the 
    +Each ExecutionGraph has a job status associated with it.
    +This job status indicates the current state of the job execution.
    +
    +A Flink job is first in the *created* state, then switches to *running* 
and upon completion of all work it switches to *finished*.
    +In case of failures, a job switches first to *failing* where it cancels 
all running tasks.
    +If all job vertices have reached a final state and the job is not 
restartable, then the job transitions to *failed*.
    +If the job can be restarted, then it will enter the *restarting* state.
    +Once the job has been completely restarted, it will reach the *created* 
state.
    +
    +In case that the user cancels the job, it will go into the *cancelling* 
state.
    +This is also entails the cancellation of all currently running tasks.
    --- End diff --
    
    Thanks for spotting :-)


> ExecutionGraphs can become orphans
> ----------------------------------
>
>                 Key: FLINK-3800
>                 URL: https://issues.apache.org/jira/browse/FLINK-3800
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> The {{JobManager.cancelAndClearEverything}} method fails all currently 
> executed jobs on the {{JobManager}} and then clears the list of 
> {{currentJobs}} kept in the JobManager. This can become problematic if the 
> user has set a restart strategy for a job, because the {{RestartStrategy}} 
> will try to restart the job. This can lead to unwanted re-deployments of the 
> job which consumes resources and thus will trouble the execution of other 
> jobs. If the restart strategy never stops, then this prevents that the 
> {{ExecutionGraph}} from ever being properly terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3800) ExecutionGraphs can become orphans

Reply via email to