[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315427#comment-14315427
 ] 

Bikas Saha commented on TEZ-714:
--------------------------------

I was wondering if the shared threadpool added in TEZ-1914 can be leveraged 
here. Looking at the code, there seems to be a potentially simple solution that 
could be considered.

1) VertexImpl has a new COMMITTING state. After vertex succeeds and if 1 or 
more commits are needed, then each commit operation is wrapped in a callable 
and scheduled onto the threadpool. The count is maintained and state changes to 
COMMITTING. This is similar to INITIALIZING.
2) When these callable complete then onSuccess they send a V_COMMIT_SUCCESS 
event. While in COMMITTING state, when all such events are received then the 
VertexImpl moves to SUCCEEDED state and does the remaining processing like 
sending out completed events
3) On failure the callable call onFailure which sends a V_COMMIT_FAILED event. 
Receiving this event make the vertex go into FAILED state.
4) From COMMITTING state, vertex will go to KILLED upon termination event (like 
in RUNNING state) or FAILED upon task re-running (like succeeded state)
5) KILLED/FAILED states ignore any further V_COMMIT_FAILED or V_COMMIT_SUCCESS 
events.

For DAGImpl, the change is complicated by the presence of shared outputs but 
can be simplified. 
1) For each shared output a similar callable is scheduled for commit but the 
DAG does not change state and increment a commitInProgress counter.
2) For each COMMIT_SUCCESS event the commitInProgress counter is decremented.
3) For COMMIT_FAILED event the DAG fails.
4) When all vertices finish then the DAG may go on an commit all vertex 
outputs. These are also scheduled as earlier as callables.
5) Before transitioning to success, DAG checks the value of commitInProgress 
and goes to SUCCEEDED if that counter == 0. Else it goes to COMMITTING state.
6) COMMITTING in DAGImpl behaves like that in VertexImpl. Waits for 
commitInProgress counter to go to 0. Goes to Killed on termination and fails on 
vertexReRunning if its a vertex doing commit.

> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to