[ 
https://issues.apache.org/jira/browse/TEZ-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947328#comment-13947328
 ] 

Siddharth Seth commented on TEZ-973:
------------------------------------

{code}
       endState = DAGState.ERROR;
{code}
DAGImpl - during recovery sets state to ERROR ? Should that just be FAILED.

DAGCommitStartedEvent - the failure reason gets reported as COMMIT_FAILED. 
Should this be INTERNAL_ERROR to be consistent with VertexComitFailure. 
Similarly for VertexGroupCommitStarted / VertexGroupCommitFinished.

Vertex history event write failure is putting the DAG into ERROR state. FAILED 
with a different cause seems more appropriate - that's consistent with critical 
summary failures causing the DAG to stay in it's current state or be marked as 
FAILEd/KILLed. INTERNAL_ERROR  remains as a means of indicating likely bugs in 
the state machines.

> Abort additional attempts if recovery fails.
> --------------------------------------------
>
>                 Key: TEZ-973
>                 URL: https://issues.apache.org/jira/browse/TEZ-973
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Hitesh Shah
>         Attachments: TEZ-973.1.patch, TEZ-973.2.patch, TEZ-973.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to