[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383586#comment-14383586
]
Jeff Zhang commented on TEZ-714:
--------------------------------
Upload new patch to address the review comment.
* Why is this calling Vertex.abortVertex() instead of directly calling
committer.abort() for the outputs?
Several reasons.
** reuse the abort code of vertex
** there's one bug in the existing code that it would abort the committed
output when dag is failed, but it is supposed not to abort in this case.
* Both for DAG & Vertex wrap all the commit related code into one
CommitCompletedTransition
* bq. Should we wait for all outstanding commit operations to get cancelled or
complete and then call abort on all outputs?
It is done in the new patch except when internal error happens. Still use the
original InternaErrorTransition
** For Vertex, when commit completed event and task completed event happens, it
would call checkVertexForCompletion
** For DAG, when commit completed event and vertex completed event happens, it
would call checkDAGForCompletion
* Add unit test in TestCommit & e2e test in TestMockDAGAppMaster
* Maybe an issue: when vertex's output is being committing as an vertex group
output, then what state should this vertex in ? Currently vertex will go to
SUCCEEDED state, but may be better to move to COMMITTING state
* Should Task wait for all the task attempts completed before move to
SUCCEEDED, otherwise it is possible that vertex is in COMMITTING while there
still task attempt is still running in the case of speculation.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch,
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)