[ 
https://issues.apache.org/jira/browse/TEZ-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294639#comment-14294639
 ] 

Jeff Zhang commented on TEZ-1895:
---------------------------------

[~hitesh]  Upload a new patch
* Add new test for checking dag fails if vertex re-runs after vertex group 
commit has run.
* 2 minor changes on patch.
** Add one more termination cause check for dag completion
{code}
      else if (vertex.terminationCause == 
VertexTerminationCause.COMMIT_FAILURE) {
        vertex.setFinishTime();
        String diagnosticMsg = "Vertex failed/killed due to COMMIT_FAILURE 
failed. "
            + "failedTasks:"
            + vertex.failedTaskCount
            + " killedTasks:"
            + vertex.killedTaskCount;
        LOG.info(diagnosticMsg);
        vertex.abortVertex(State.FAILED);
        return vertex.finished(VertexState.FAILED);
      }
{code} 
** Decrease DAG::numCompletedVertices only when DAG::vertexReRunning return 
true. Otherwise dag would never finish because DAG::numCompletedVertices is 
always less than the total vertex number.
{code}
      boolean failed = job.vertexReRunning(vertex);
      if (!failed) {
        job.numCompletedVertices--;
      }
{code}


> Vertex reRunning should decrease successfulMembers of VertexGroupInfo
> ---------------------------------------------------------------------
>
>                 Key: TEZ-1895
>                 URL: https://issues.apache.org/jira/browse/TEZ-1895
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1895-1.patch, TEZ-1895-2.patch
>
>
> Vertex reRunning should decrease successfulMembers of VertexGroupInfo, 
> otherwise commit may happen when vertex is still in rerunning. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to