[ 
https://issues.apache.org/jira/browse/TEZ-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-728:
---------------------------

    Attachment: TEZ-728.2.patch

Patch with javadoc changes and dead code removed.

> Semantics of output commit
> --------------------------
>
>                 Key: TEZ-728
>                 URL: https://issues.apache.org/jira/browse/TEZ-728
>             Project: Apache Tez
>          Issue Type: Task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-728.1.patch, TEZ-728.2.patch
>
>
> Currently, vertices commit outputs when they succeed. However, if the job 
> fails then these outputs are not aborted.
> After speaking to Pig and Hive folks, both allow optional partial visibility 
> semantics. So if there are 2 vertices writing output and one of them (A) 
> passes and the other fails. Based on a user flag, Pig and Hive allow the 
> partial output of vertex A to be visible or not. So we need to support 
> 1) DAG fails - no output is visible
> 2) DAG fails - partial output is visible
> In order to support this, we could move output commit to DAG completion. If 
> the DAG succeeds, commit will be called on all output committers. If the DAG 
> fails, then abort will be called on all output committers. Optionally, if the 
> DAG fails then commit will be called on all successful vertices and abort 
> will be called on all failed vertices.
> This will also help the case when multiple vertices are writing to the same 
> output (union store). The DAG can call commit once on that output and ensure 
> correct commit semantics according to the commit API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to