[
https://issues.apache.org/jira/browse/TEZ-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294639#comment-14294639
]
Jeff Zhang commented on TEZ-1895:
---------------------------------
[~hitesh] Upload a new patch
* Add new test for checking dag fails if vertex re-runs after vertex group
commit has run.
* 2 minor changes on patch.
** Add one more termination cause check for dag completion
{code}
else if (vertex.terminationCause ==
VertexTerminationCause.COMMIT_FAILURE) {
vertex.setFinishTime();
String diagnosticMsg = "Vertex failed/killed due to COMMIT_FAILURE
failed. "
+ "failedTasks:"
+ vertex.failedTaskCount
+ " killedTasks:"
+ vertex.killedTaskCount;
LOG.info(diagnosticMsg);
vertex.abortVertex(State.FAILED);
return vertex.finished(VertexState.FAILED);
}
{code}
** Decrease DAG::numCompletedVertices only when DAG::vertexReRunning return
true. Otherwise dag would never finish because DAG::numCompletedVertices is
always less than the total vertex number.
{code}
boolean failed = job.vertexReRunning(vertex);
if (!failed) {
job.numCompletedVertices--;
}
{code}
> Vertex reRunning should decrease successfulMembers of VertexGroupInfo
> ---------------------------------------------------------------------
>
> Key: TEZ-1895
> URL: https://issues.apache.org/jira/browse/TEZ-1895
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-1895-1.patch, TEZ-1895-2.patch
>
>
> Vertex reRunning should decrease successfulMembers of VertexGroupInfo,
> otherwise commit may happen when vertex is still in rerunning.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)