[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486038#comment-14486038
]
Bikas Saha edited comment on TEZ-714 at 4/8/15 9:06 PM:
--------------------------------------------------------
abortVertex() should probably be called for all non-succeeded states? Similar
to DAGImpl. Otherwise TaskCompletedAfterVertexSuccessTransition will have a
regression.
{code} case ERROR:
addDiagnostic("Vertex: " + logIdentifier + " error due to:" +
terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
eventHandler.handle(new DAGEvent(getDAGId(),
DAGEventType.INTERNAL_ERROR));
try {
logJobHistoryVertexFailedEvent(finalState);
} catch (IOException e) {
LOG.error("Failed to send vertex finished event to recovery", e);
}
break;
case KILLED:
case FAILED:
addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" +
terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}
testCommitOutputOnVertexSuccess() still has
{code} // both committers fail
DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}
Mis-written comment? v3 is killed due to...
{code} // killed is failed due to the commit failure of the vertex group
(v1,v2)
Assert.assertEquals(VertexState.KILLED, v3.getState());
Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
{code}
The test is not checking/verifying that one of the commits was indeed not
scheduled on the threadpool and cancelled before that. Maybe set a flag when
the callable is scheduled and check that the flag was not set. Also perhaps set
a flag when interrupted exception is caught when the scheduled event got
canceled.
{code}
// test commit will be canceled no matter it is started or still in the
threadpool
@Test(timeout = 5000)
public void testCommitCanceled() throws Exception {
{code}
Please cleanup the findbugs warnings.
Spoke offline with [~hitesh] about aborting all output after any failure and it
seems like the right thing to do.
was (Author: bikassaha):
abortVertex() should probably be called for all non-succeeded states? Other
TaskCompletedAfterVertexSuccessTransition will have a regression.
{code} case ERROR:
addDiagnostic("Vertex: " + logIdentifier + " error due to:" +
terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
eventHandler.handle(new DAGEvent(getDAGId(),
DAGEventType.INTERNAL_ERROR));
try {
logJobHistoryVertexFailedEvent(finalState);
} catch (IOException e) {
LOG.error("Failed to send vertex finished event to recovery", e);
}
break;
case KILLED:
case FAILED:
addDiagnostic("Vertex " + logIdentifier + " killed/failed due to:" +
terminationCause);
if (!StringUtils.isEmpty(diag)) {
addDiagnostic(diag);
}
abortVertex(VertexStatus.State.valueOf(finalState.name()));
{code}
testCommitOutputOnVertexSuccess() still has
{code} // both committers fail
DAG dag4 = createDAG("testDAGBothCommitsFail", false, true); <<< true/true
{code}
Mis-written comment? v3 is killed due to...
{code} // killed is failed due to the commit failure of the vertex group
(v1,v2)
Assert.assertEquals(VertexState.KILLED, v3.getState());
Assert.assertEquals(VertexTerminationCause.OTHER_VERTEX_FAILURE,
{code}
The test is not checking/verifying that one of the commits was indeed not
scheduled on the threadpool and cancelled before that. Maybe set a flag when
the callable is scheduled and check that the flag was not set. Also perhaps set
a flag when interrupted exception is caught when the scheduled event got
canceled.
{code}
// test commit will be canceled no matter it is started or still in the
threadpool
@Test(timeout = 5000)
public void testCommitCanceled() throws Exception {
{code}
Please cleanup the findbugs warnings.
Spoke offline with [~hitesh] about aborting all output after any failure and it
seems like the right thing to do.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-10.patch,
> TEZ-714-2.patch, TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch,
> TEZ-714-6.patch, TEZ-714-7.patch, TEZ-714-8.patch, TEZ-714-9.patch,
> Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)