[
https://issues.apache.org/jira/browse/FLINK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877569#comment-15877569
]
ASF GitHub Bot commented on FLINK-5501:
---------------------------------------
GitHub user shuai-xu opened a pull request:
https://github.com/apache/flink/pull/3385
[FLINK-5501] JM use running job registry to determine whether is the first
running
This pr if for
jira-#[5501](https://issues.apache.org/jira/browse/FLINK-5501).
The main changes are:
1. Add interface isJobFinished() and clearJob() to RunningJobRegistry and
implement them.
2. After grantLeadership, JMRunner will first check whether the job is
finished, if finished, it means that other JM has finished the job, it only
need to exist.
3. Then JMRunner will check whether the job is running, if running, it
means other JM has run it, but not succeeded, so it need to recover it.
4. If the job is not running, it mean the first running, the JMRunner will
setJobRunning in RunningJobRegistry.
5. After job finished, will clear the job state from RunningJobRegistry
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shuai-xu/flink jira-5501
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3385.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3385
----
commit 7c5068e7ea0592f3ba0527d3d363c7cf4653713d
Author: shuai.xus <[email protected]>
Date: 2017-02-22T06:15:43Z
[FLINK-5501] JM use running job registry to determine whether is the first
running
----
> Determine whether the job starts from last JobManager failure
> -------------------------------------------------------------
>
> Key: FLINK-5501
> URL: https://issues.apache.org/jira/browse/FLINK-5501
> Project: Flink
> Issue Type: Sub-task
> Components: JobManager
> Reporter: zhijiang
> Assignee: shuai.xu
>
> When the {{JobManagerRunner}} grants leadership, it should check whether the
> current job is already running or not. If the job is running, the
> {{JobManager}} should reconcile itself (enter RECONCILING state) and waits
> for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}}
> can schedule the {{ExecutionGraph}} in common way.
> The {{RunningJobsRegistry}} can provide the way to check the job running
> status, but we should expand the current interface and fix the related
> process to support this function.
> 1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}}
> granting leadership at the first time.
> 2. If the job finishes, the job status will be set FINISHED by
> {{RunningJobsRegistry}} and the status will be deleted before exit.
> 3. If the mini cluster starts multi {{JobManagerRunner}}, and the leader
> {{JobManagerRunner}} already finishes the job to set the job status FINISHED,
> other {{JobManagerRunner}} will exit after grants the leadership again.
> 4. If the {{JobManager}} fails, the job status will be still in RUNNING. So
> if the {{JobManagerRunner}} (the previous or new one) grants leadership
> again, it will check the job status and enters {{RECONCILING}} state.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)