[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366878#comment-14366878
]
Jeff Zhang commented on TEZ-714:
--------------------------------
[~bikassaha] I think the biggest issue in my patch is the granularity of the
committer thread. Currently I take it as vertex/dag level, but I think it
should be one OutputCommitter per thread. I will update the patch later.
For the other parts of the patch, here's more description of my patch, hope it
can clarify the my patch.
* VertexImpl.
** Main change is in checkVertexForCompletion where commit will happen. I
change it to async commit by wrapping it into CallableEvent and submit it to
Shared Thread Pool. Here introduce new State COMMITTING which repsent vertex is
in committing.
** Also make the abort operation as async operation. No new state is introduced
here, if Vertex is in aborting, then it is in state of TERMINATING.
** DAGImpl
** Main change is in checkDAGForCompletion() where dag commit will happen and
vertexSucceeded() where vertex group commit will happen. And like VertexImpl,
I aslo wrap the dag commit and vertex group commit into CallableEvent and
submit to shared thread pool. Here also introduce new state COMMITTING which
represent that all the vertices are done but still some committing(dag commit
or vertex group commit) are not yet completed.
** Like the VertexImpl, if the dag is in aborting , then it is in state of
TERMINATING.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)