[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377401#comment-14377401
]
Bikas Saha commented on TEZ-714:
--------------------------------
bq. It could, but this may make the transition complicated. Currently we need
to differentiate these 2 kinds of commits, besides there's 2 possible states
(RUNNING, COMMITTING) when the commit happens and we also need check handle 2
different cases (commit succeeded & failure), so there would be totally 8
different cases in one transition which may be difficult to read.
I am looking at TaskAttemptImpl#TerminatedBeforeRunningTransition state
transitions as inspiration. There are some standard things to do when a commit
operation completes. e.g. decrement the outstanding commit counter. If commit
was a group commit then write the recovery entry for it. If the commit fails
then set a flag to abort. This can be in a base transition say
CommitCompletedTransition. Then we can have
CommitCompletedWhileRunningTransition that calls the base for common code and
does running specific stuff.e.g. trigger job failure upon commit failure. And
another transition for CommitCompletedWhileCommitting that just waits for the
commit counter to drop to 0. Next, CommitCompletedWhileTerminating which waits
for all commit operations to complete and then calls abort (this could be
blocking for now).
Perhaps, all commit events need to have a shared boolean that they should check
before invoking commit. This boolean could be set to false when the vertex/dag
decides to abort. This would make and pending commit operations complete
quickly instead of trying to commit unnecessarily.
Some e2e scenarios could be tested via simulation using the MockDAGAppMaster.
Create custom committers that fail/pass as desired and check that the dag
behaved as expected.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)