[ 
https://issues.apache.org/jira/browse/FLINK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877569#comment-15877569
 ] 

ASF GitHub Bot commented on FLINK-5501:
---------------------------------------

GitHub user shuai-xu opened a pull request:

    https://github.com/apache/flink/pull/3385

    [FLINK-5501] JM use running job registry to determine whether is the first 
running

    This pr if for 
jira-#[5501](https://issues.apache.org/jira/browse/FLINK-5501).
    
    The main changes are:
    1. Add interface isJobFinished() and clearJob() to RunningJobRegistry and 
implement them.
    2. After grantLeadership, JMRunner will first check whether the job is 
finished, if finished, it means that other JM has finished the job, it only 
need to exist.
    3. Then JMRunner will check whether the job is running, if running, it 
means other JM has run it, but not succeeded, so it need to recover it.
    4. If the job is not running, it mean the first running, the JMRunner will 
setJobRunning in RunningJobRegistry.
    5. After job finished, will clear the job state from RunningJobRegistry 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shuai-xu/flink jira-5501

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3385
    
----
commit 7c5068e7ea0592f3ba0527d3d363c7cf4653713d
Author: shuai.xus <[email protected]>
Date:   2017-02-22T06:15:43Z

    [FLINK-5501] JM use running job registry to determine whether is the first 
running

----


> Determine whether the job starts from last JobManager failure
> -------------------------------------------------------------
>
>                 Key: FLINK-5501
>                 URL: https://issues.apache.org/jira/browse/FLINK-5501
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>            Reporter: zhijiang
>            Assignee: shuai.xu
>
> When the {{JobManagerRunner}} grants leadership, it should check whether the 
> current job is already running or not. If the job is running, the 
> {{JobManager}} should reconcile itself (enter RECONCILING state) and waits 
> for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} 
> can schedule the {{ExecutionGraph}} in common way.
> The {{RunningJobsRegistry}} can provide the way to check the job running 
> status, but we should expand the current interface and fix the related 
> process to support this function.
> 1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} 
> granting leadership at the first time.
> 2. If the job finishes, the job status will be set FINISHED by 
> {{RunningJobsRegistry}} and the status will be deleted before exit. 
> 3. If the mini cluster starts multi {{JobManagerRunner}}, and the leader 
> {{JobManagerRunner}} already finishes the job to set the job status FINISHED, 
> other {{JobManagerRunner}} will exit after grants the leadership again.
> 4. If the {{JobManager}} fails, the job status will be still in RUNNING. So 
> if the {{JobManagerRunner}} (the previous or new one) grants leadership 
> again, it will check the job status and enters {{RECONCILING}} state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to