[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482492#comment-14482492
 ] 

Jeff Zhang commented on TEZ-714:
--------------------------------

[~bikassaha]
Upload new patch with refactoring on the checkForCompletion()
Besides that, I remove boolean field committed in both DAG/Vertex. Because I 
think it is impossible for commit to be invoked multiple times. But it is 
possible for aborted operation be invoked multiple times (move from FAILED to 
ERROR)  

bq. Will calling cancel on the future object is going to result in 
VertexCommitCallback#onFailure() to be invoked. If not, then the vertex will 
hang on waiting for the commitFutures to be empty because no CommitCompleted 
event will come.
Yes, canceling the future object will result in  
VertexCommitCallback#onFailure() to be invoked, so finally, commitFutures will 
be empty. 

bq. Is this existing behavior or behavior in the patch? In either case, this 
logic is not consistent from a users point of view. Depending on which async 
commit operation ran first on the threadpool and failed, the user will see 
anywhere between 1 to N-1 committed outputs. Is that observation correct? If 
yes, is that a better choice than saying - User will see either no outputs or 
all successfully committed outputs?
The behavior in this patch is consistent with the existing behavior. For 
vertex, the existing behavior is that if any fail event happens (commit fail, 
vertex termination event and etc ), all the commits will be aborted no matter 
it is successful commit or failed commit (of course, abort should have no 
effect on the successful commits). The behavior in the patch is that all the 
pending commits will been canceled and wait for them to complete and then abort 
all the commits, move to finished state.

> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, 
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, TEZ-714-6.patch, 
> TEZ-714-7.patch, TEZ-714-8.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to