[ 
https://issues.apache.org/jira/browse/TEZ-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha resolved TEZ-728.
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.3.0

commit cdb62f9003686e4408ee39addfba23eb6fa2c5d6
Author: Bikas Saha <[email protected]>
Date:   Fri Jan 17 14:32:18 2014 -0800

    TEZ-728. Semantics of output commit (bikas)

> Semantics of output commit
> --------------------------
>
>                 Key: TEZ-728
>                 URL: https://issues.apache.org/jira/browse/TEZ-728
>             Project: Apache Tez
>          Issue Type: Task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 0.3.0
>
>         Attachments: TEZ-728.1.patch, TEZ-728.2.patch, TEZ-728.3.patch
>
>
> Currently, vertices commit outputs when they succeed. However, if the job 
> fails then these outputs are not aborted.
> After speaking to Pig and Hive folks, both allow optional partial visibility 
> semantics. So if there are 2 vertices writing output and one of them (A) 
> passes and the other fails. Based on a user flag, Pig and Hive allow the 
> partial output of vertex A to be visible or not. So we need to support 
> 1) DAG fails - no output is visible
> 2) DAG fails - partial output is visible
> In order to support this, we could move output commit to DAG completion. If 
> the DAG succeeds, commit will be called on all output committers. If the DAG 
> fails, then abort will be called on all output committers. Optionally, if the 
> DAG fails then commit will be called on all successful vertices and abort 
> will be called on all failed vertices.
> This will also help the case when multiple vertices are writing to the same 
> output (union store). The DAG can call commit once on that output and ensure 
> correct commit semantics according to the commit API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to