[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386086#comment-14386086
]
Jeff Zhang edited comment on TEZ-714 at 3/30/15 3:14 AM:
---------------------------------------------------------
bq. This is a known issue. There is a comment somewhere in the code but I dont
think its tracked by a jira.
Create TEZ-2249 to track it.
bq. After the first if() check shouldn't the code simply return
vertex.finished(SUCCEEDED) ? Why check again for commitFutures.isEmpty()?
The first if() means that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and
the commit has not been started. In this case call commitOrFinished which would
call async commit if there's any committer for this vertex or just finish if no
committer. The second if() means either
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true or all the committers are
completed successfully. May need to refactor code here to make it more clear
bq. Why is checkForCompletion called after tasks finished and commit finished?
Those 2 sounds like different logical steps and should use separate methods.
Perhaps a preCommit() and postCommit() method. If there is nothing to commit
then both can be called back to back.
Both task finishing and commit finishing may be the last step to the finished
state (if no commit then task finishing is the last step, and committing finish
is the last step if there's commit), so they would both trigger
checkForCompletion.
bq. Optimistic case: Ignore failed commits and allow all commits to complete
(with success or fail). If all commits succeed then proceed to succeeded. If
any commit fails, then proceed to failed.
What i am doing is not exactly this way. I won't ignore failed commits, stead I
will cancel pending commits and wait for them to complete and then move to
failed state. This behavior is consistent with other cases when there's any
fail event happens (commit fail, vertex termination event and etc ), all the
cases would cancel pending commits and wait for them to complete and then move
to finished state.
bq. For the optimistic case, all failures should go into the Terminating state.
Errors should not cancel commits.
Why Errors should not cancel commits ? Just because Errors don't have effort on
the data to be committed ?
was (Author: zjffdu):
bq. This is a known issue. There is a comment somewhere in the code but I dont
think its tracked by a jira.
Create TEZ-2249 to track it.
bq. After the first if() check shouldn't the code simply return
vertex.finished(SUCCEEDED) ? Why check again for commitFutures.isEmpty()?
The first if() means that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and
the commit has not been started. In this case call commitOrFinished which would
call async commit if there's any committer for this vertex or just finish if no
committer. The second if() means either
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true or all the committers are
completed successfully. May need to refactor code here to make it more clear
bq. Why is checkForCompletion called after tasks finished and commit finished?
Those 2 sounds like different logical steps and should use separate methods.
Perhaps a preCommit() and postCommit() method. If there is nothing to commit
then both can be called back to back.
Both task finishing and commit finishing may be the last step to the finished
state (if no commit then task finishing is the last step, and committing finish
is the last step if there's commit), so they would both trigger
checkForCompletion.
bq. Optimistic case: Ignore failed commits and allow all commits to complete
(with success or fail). If all commits succeed then proceed to succeeded. If
any commit fails, then proceed to failed.
What i am doing is not exactly this way. I won't ignore failed commits, stead I
will cancel pending commits and wait for them to complete and then move to
failed state. This behavior is consistent with other cases when there's any
fail event happens (commit fail, vertex termination event and etc ), all the
cases would cancel pending commits and wait for them to complete and then move
to finished state.
bq. For the optimistic case, all failures should go into the Terminating state.
Errors should not cancel commits.
Why Errors should not cancel commits ? Just because Errors don't have effort on
the data to be committed ?
bq. E.g. TaskCompletedAfterVertexSuccessTransition is clearing all successful
commits and choosing the pessimistic case.
TaskCompletedAfterVertexSuccessTransition is aborting all pending commits. So
the logic does not seem to be consistent everywhere. We need to make this
consistent. Either choose the pessimistic case or choose the optimistic case.
bq. For the abort method itself, it could take a boolean parameter on whether
to abort all or not instead of looking at the successful completions map. This
avoids side-effect code like having to clear the successful maps before
invoking abortVertex().
I think I made a mistake. It is not necessary to clear the successful maps. All
the successful commits should not been aborted no matter what case it is.
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch,
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)