[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315427#comment-14315427
]
Bikas Saha commented on TEZ-714:
--------------------------------
I was wondering if the shared threadpool added in TEZ-1914 can be leveraged
here. Looking at the code, there seems to be a potentially simple solution that
could be considered.
1) VertexImpl has a new COMMITTING state. After vertex succeeds and if 1 or
more commits are needed, then each commit operation is wrapped in a callable
and scheduled onto the threadpool. The count is maintained and state changes to
COMMITTING. This is similar to INITIALIZING.
2) When these callable complete then onSuccess they send a V_COMMIT_SUCCESS
event. While in COMMITTING state, when all such events are received then the
VertexImpl moves to SUCCEEDED state and does the remaining processing like
sending out completed events
3) On failure the callable call onFailure which sends a V_COMMIT_FAILED event.
Receiving this event make the vertex go into FAILED state.
4) From COMMITTING state, vertex will go to KILLED upon termination event (like
in RUNNING state) or FAILED upon task re-running (like succeeded state)
5) KILLED/FAILED states ignore any further V_COMMIT_FAILED or V_COMMIT_SUCCESS
events.
For DAGImpl, the change is complicated by the presence of shared outputs but
can be simplified.
1) For each shared output a similar callable is scheduled for commit but the
DAG does not change state and increment a commitInProgress counter.
2) For each COMMIT_SUCCESS event the commitInProgress counter is decremented.
3) For COMMIT_FAILED event the DAG fails.
4) When all vertices finish then the DAG may go on an commit all vertex
outputs. These are also scheduled as earlier as callables.
5) Before transitioning to success, DAG checks the value of commitInProgress
and goes to SUCCEEDED if that counter == 0. Else it goes to COMMITTING state.
6) COMMITTING in DAGImpl behaves like that in VertexImpl. Waits for
commitInProgress counter to go to 0. Goes to Killed on termination and fails on
vertexReRunning if its a vertex doing commit.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)