[ https://issues.apache.org/jira/browse/FLINK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885588#comment-15885588 ]
ASF GitHub Bot commented on FLINK-5501: --------------------------------------- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3385 Thanks! I think I can take this over now... > Determine whether the job starts from last JobManager failure > ------------------------------------------------------------- > > Key: FLINK-5501 > URL: https://issues.apache.org/jira/browse/FLINK-5501 > Project: Flink > Issue Type: Sub-task > Components: JobManager > Reporter: zhijiang > Assignee: shuai.xu > > When the {{JobManagerRunner}} grants leadership, it should check whether the > current job is already running or not. If the job is running, the > {{JobManager}} should reconcile itself (enter RECONCILING state) and waits > for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} > can schedule the {{ExecutionGraph}} in common way. > The {{RunningJobsRegistry}} can provide the way to check the job running > status, but we should expand the current interface and fix the related > process to support this function. > 1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} > granting leadership at the first time. > 2. If the job finishes, the job status will be set FINISHED by > {{RunningJobsRegistry}} and the status will be deleted before exit. > 3. If the mini cluster starts multi {{JobManagerRunner}}, and the leader > {{JobManagerRunner}} already finishes the job to set the job status FINISHED, > other {{JobManagerRunner}} will exit after grants the leadership again. > 4. If the {{JobManager}} fails, the job status will be still in RUNNING. So > if the {{JobManagerRunner}} (the previous or new one) grants leadership > again, it will check the job status and enters {{RECONCILING}} state. -- This message was sent by Atlassian JIRA (v6.3.15#6346)