[ 
https://issues.apache.org/jira/browse/TEZ-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528986#comment-14528986
 ] 

Bikas Saha commented on TEZ-2412:
---------------------------------

Instead of doing this, should we call checkVerticesForCompletion() in the else 
block. This way this transition will not go out of sync with any future checks 
added in checkVerticesforCompletion(). E.g. today the check is 
completedCount=TotalCount but later it could be different.
{code}
     } else {
-      if (!dag.commitFutures.isEmpty()) {
-        // pending commits are running
+      if (!dag.commitFutures.isEmpty() || dag.numCompletedVertices != 
dag.numVertices) {
+        // pending commits are running or still some vertices are not completed
         return DAGState.TERMINATING;
       } else {
         return finishWithTerminationCause(dag); <<<<<< replace this with 
checkVerticesForCompletion. So if there is an error with terminationCause set, 
then this transition only changes the commitFutures and uses common logic to 
handle the other checks for completion on error.
{code}
Similarly this could be changed
{code}
    if (dag.terminationCause == null) {
      Preconditions.checkState(dag.getState() == DAGState.COMMITTING,
          "DAG should be in COMMITTING state, but in " + dag.getState());
      if (!dag.commitFutures.isEmpty()) {
        // pending commits are running
        return DAGState.COMMITTING;
      } else {
        return dag.finished(DAGState.SUCCEEDED); <<< return 
checkVerticesForCompletion() which does the same thing.
      }{code}

Thoughts?

If you dont think this is the correct approach, then we can commit the current 
patch with a comment in checkVerticesForCompletion warning that changes in that 
transition require an inspection/update of checkCommitsForCompletion for 
similar changes.

> Should kill vertex in DAGImpl#VertexRerunWhileCommitting
> --------------------------------------------------------
>
>                 Key: TEZ-2412
>                 URL: https://issues.apache.org/jira/browse/TEZ-2412
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>             Fix For: 0.7.0
>
>         Attachments: TEZ-2412-1.patch
>
>
> * When vertex rerun, it move to RUNNING state, so should kill it in 
> DAGImpl#VertexRerunWhileCommitting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to