[
https://issues.apache.org/jira/browse/TEZ-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bikas Saha updated TEZ-728:
---------------------------
Attachment: TEZ-728.2.patch
Patch with javadoc changes and dead code removed.
> Semantics of output commit
> --------------------------
>
> Key: TEZ-728
> URL: https://issues.apache.org/jira/browse/TEZ-728
> Project: Apache Tez
> Issue Type: Task
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-728.1.patch, TEZ-728.2.patch
>
>
> Currently, vertices commit outputs when they succeed. However, if the job
> fails then these outputs are not aborted.
> After speaking to Pig and Hive folks, both allow optional partial visibility
> semantics. So if there are 2 vertices writing output and one of them (A)
> passes and the other fails. Based on a user flag, Pig and Hive allow the
> partial output of vertex A to be visible or not. So we need to support
> 1) DAG fails - no output is visible
> 2) DAG fails - partial output is visible
> In order to support this, we could move output commit to DAG completion. If
> the DAG succeeds, commit will be called on all output committers. If the DAG
> fails, then abort will be called on all output committers. Optionally, if the
> DAG fails then commit will be called on all successful vertices and abort
> will be called on all failed vertices.
> This will also help the case when multiple vertices are writing to the same
> output (union store). The DAG can call commit once on that output and ensure
> correct commit semantics according to the commit API.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)