[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484935#comment-14484935
 ] 

Jeff Zhang commented on TEZ-714:
--------------------------------

[~bikassaha] Thanks for the thorough reviews. Upload new patch to address the 
review comments.

bq. Why has committed check been removed? Does not hurt to keep it there. The 
code may have been added after observing a bug that we may not be considering 
when we remove it in this patch.
Keep it in the new patch. Not sure when the commit will be invoked multiple 
times. If it is for task rerun, then it should fail when rerun happens after 
commit is done.

bq. Can we put a precondition that checks that the current vertex state is 
committing? And similarly in the else stmt that current vertex state is 
terminating.
Add a precondition for the COMMITTING, but for terminating, the commit failure 
may happen in the state of COMMITTING/TERMINATING. And I think the current 
state should always been verified in the state machine. For vertex, the commit 
should only happen in the state of COMMITTING/TERMINATING.

bq. testCommitOutputOnDAGSuccess() The assert conditions are same for all 3 
cases. Not sure how we are differentiating between them?
Add assertion on the diagnostics.

bq. Please use wait/notify instead of while loop
Because when the commit is started is not determined, so I can not guarantee to 
call notify after wait is invoked. That's why I use and a flag and while loop.

bq. How is the test testing one success causes other to abort, if both 
committers are setup to fail?
For each commit, I have a flag to block the commit operation. So the commit 
won't finish unless I unblock it. Anyway, I make the second commit flag as true 
in the new patch to avoid misunderstanding.

bq. How is ordering guaranteed here? Unless the test is using a single threaded 
shared threadpool.
Same as above.

bq. We are missing the case where the commit is cancelled and it gets cancelled 
on the threadpool and invokes onFailure(). This may be possible to test if we 
set the threadpool to be a fixed size of 1.
Add it in the new patch.

> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, 
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, TEZ-714-6.patch, 
> TEZ-714-7.patch, TEZ-714-8.patch, TEZ-714-9.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to