[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482492#comment-14482492
]
Jeff Zhang commented on TEZ-714:
--------------------------------
[~bikassaha]
Upload new patch with refactoring on the checkForCompletion()
Besides that, I remove boolean field committed in both DAG/Vertex. Because I
think it is impossible for commit to be invoked multiple times. But it is
possible for aborted operation be invoked multiple times (move from FAILED to
ERROR)
bq. Will calling cancel on the future object is going to result in
VertexCommitCallback#onFailure() to be invoked. If not, then the vertex will
hang on waiting for the commitFutures to be empty because no CommitCompleted
event will come.
Yes, canceling the future object will result in
VertexCommitCallback#onFailure() to be invoked, so finally, commitFutures will
be empty.
bq. Is this existing behavior or behavior in the patch? In either case, this
logic is not consistent from a users point of view. Depending on which async
commit operation ran first on the threadpool and failed, the user will see
anywhere between 1 to N-1 committed outputs. Is that observation correct? If
yes, is that a better choice than saying - User will see either no outputs or
all successfully committed outputs?
The behavior in this patch is consistent with the existing behavior. For
vertex, the existing behavior is that if any fail event happens (commit fail,
vertex termination event and etc ), all the commits will be aborted no matter
it is successful commit or failed commit (of course, abort should have no
effect on the successful commits). The behavior in the patch is that all the
pending commits will been canceled and wait for them to complete and then abort
all the commits, move to finished state.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch,
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, TEZ-714-6.patch,
> TEZ-714-7.patch, TEZ-714-8.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)