[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

Bikas Saha (JIRA) Sun, 05 Apr 2015 18:07:23 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14446581#comment-14446581
 ]


Bikas Saha commented on TEZ-714:
--------------------------------

bq. Both task finishing and commit finishing may be the last step to the 
finished state (if no commit then task finishing is the last step, and 
committing finish is the last step if there's commit), so they would both 
trigger checkForCompletion.
Today, the semantics of checkForCompletion() are actually 
checkForTasksCompletion(). It checks if all tasks have succeeded. On success, 
it would complete all commits synchronously and return the final state 
SUCCEEDED. This patch is overloading checkForCompletion() to do both 
checkForTasksCompletion() and checkForCommitsCompletion() resulting in logic 
overloading that is error-prone and makes future changes harder because the 
method is doing 2 things.
{code}
+        LOG.info("All tasks are succeeded, vertex:" + vertex.logIdentifier);
+        if (!vertex.commitVertexOutputs) {
+          // just finish because no vertex committing needed
+          return vertex.finished(VertexState.SUCCEEDED);
+        } else if (!vertex.committed.getAndSet(true)) {
+          // start commit if there're commits or just finish if no commits
+          return commitOrFinish(vertex);
// This part belongs to checkForCommitsCompletion()
+        } else if (vertex.commitFutures.isEmpty()) {
+          // move from COMMITTING to SUCCEEDED
+          return vertex.finished(VertexState.SUCCEEDED);
+        } else {
+          return VertexState.COMMITTING;
         }{code}

bq. What i am doing is not exactly this way. I won't ignore failed commits, 
stead I will cancel pending commits and wait for them to complete and then move 
to failed state. 
Will calling cancel on the future object is going to result in 
VertexCommitCallback#onFailure() to be invoked. If not, then the vertex will 
hang on waiting for the commitFutures to be empty because no CommitCompleted 
event will come.

bq. This behavior is consistent with other cases when there's any fail event 
happens (commit fail, vertex termination event and etc ), all the cases would 
cancel pending commits and wait for them to complete and then move to finished 
state.
Is this existing behavior or behavior in the patch? In either case, this logic 
is not consistent from a users point of view. Depending on which async commit 
operation ran first on the threadpool and failed, the user will see anywhere 
between 1 to N-1 committed outputs. Is that observation correct? If yes, is 
that a better choice than saying - User will see either no outputs or all 
successfully committed outputs?



> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, 
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, TEZ-714-6.patch, 
> TEZ-714-7.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-714) OutputCommitters should not run in the main AM dispatcher thread

Reply via email to