[
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Zhang updated TEZ-714:
---------------------------
Comment: was deleted
(was: Regarding the partial output, After second thought, I think we should
only consider it as vertex basis rather than output basis. That means either
one vertex's all outputs commits successfully or abort all. I think the
purpose of TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is to allow external system
to check the intermediate vertex's output at vertex level. Say if one vertex
has 2 outputs and TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false, and the
first commit succeeded but the second commit fails, then we should abort both
of them and mark this vertex to failed state. And it would be weird that if one
vertex go to FAILED with one commit aborted while another commit is not aborted)
> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
> Key: TEZ-714
> URL: https://issues.apache.org/jira/browse/TEZ-714
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch,
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)