[ https://issues.apache.org/jira/browse/TEZ-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947328#comment-13947328 ]
Siddharth Seth commented on TEZ-973: ------------------------------------ {code} endState = DAGState.ERROR; {code} DAGImpl - during recovery sets state to ERROR ? Should that just be FAILED. DAGCommitStartedEvent - the failure reason gets reported as COMMIT_FAILED. Should this be INTERNAL_ERROR to be consistent with VertexComitFailure. Similarly for VertexGroupCommitStarted / VertexGroupCommitFinished. Vertex history event write failure is putting the DAG into ERROR state. FAILED with a different cause seems more appropriate - that's consistent with critical summary failures causing the DAG to stay in it's current state or be marked as FAILEd/KILLed. INTERNAL_ERROR remains as a means of indicating likely bugs in the state machines. > Abort additional attempts if recovery fails. > -------------------------------------------- > > Key: TEZ-973 > URL: https://issues.apache.org/jira/browse/TEZ-973 > Project: Apache Tez > Issue Type: Bug > Reporter: Hitesh Shah > Assignee: Hitesh Shah > Attachments: TEZ-973.1.patch, TEZ-973.2.patch, TEZ-973.3.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)