[jira] [Commented] (FLINK-5501) Determine whether the job starts from last JobManager failure

ASF GitHub Bot (JIRA) Fri, 24 Feb 2017 09:45:00 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883175#comment-15883175
 ]


ASF GitHub Bot commented on FLINK-5501:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3385#discussion_r102993092
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/highavailability/ZookeeperRegistry.java
 ---
    @@ -55,7 +59,7 @@ public void setJobRunning(JobID jobID) throws IOException 
{
                try {
                        String zkPath = runningJobPath + jobID.toString();
                        
this.client.newNamespaceAwareEnsurePath(zkPath).ensure(client.getZookeeperClient());
    -                   this.client.setData().forPath(zkPath);
    +                   this.client.setData().forPath(zkPath, 
RUNNING.getBytes());
    --- End diff --
    
    String to bytes conversion (and bytes to string) must always explicitly 
specify the encoding (Charset). Otherwise, there can be mismatches when 
different machines configure different default Charsets.


> Determine whether the job starts from last JobManager failure
> -------------------------------------------------------------
>
>                 Key: FLINK-5501
>                 URL: https://issues.apache.org/jira/browse/FLINK-5501
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>            Reporter: zhijiang
>            Assignee: shuai.xu
>
> When the {{JobManagerRunner}} grants leadership, it should check whether the 
> current job is already running or not. If the job is running, the 
> {{JobManager}} should reconcile itself (enter RECONCILING state) and waits 
> for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} 
> can schedule the {{ExecutionGraph}} in common way.
> The {{RunningJobsRegistry}} can provide the way to check the job running 
> status, but we should expand the current interface and fix the related 
> process to support this function.
> 1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} 
> granting leadership at the first time.
> 2. If the job finishes, the job status will be set FINISHED by 
> {{RunningJobsRegistry}} and the status will be deleted before exit. 
> 3. If the mini cluster starts multi {{JobManagerRunner}}, and the leader 
> {{JobManagerRunner}} already finishes the job to set the job status FINISHED, 
> other {{JobManagerRunner}} will exit after grants the leadership again.
> 4. If the {{JobManager}} fails, the job status will be still in RUNNING. So 
> if the {{JobManagerRunner}} (the previous or new one) grants leadership 
> again, it will check the job status and enters {{RECONCILING}} state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5501) Determine whether the job starts from last JobManager failure

Reply via email to