[ 
https://issues.apache.org/jira/browse/TEZ-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875273#comment-13875273
 ] 

Siddharth Seth commented on TEZ-728:
------------------------------------

OutputCommitter javadoc has a lot of references to 'Vertex' - which in the 
context of multiple Outputs may or may not make sense. I think that's a 
separate jira.

In terms of not changing Vertex state, following the offline conversation - 
that's fine, and we'll likely end up adding an API for monitoring the state of 
an Output.

+1. Looks good otherwise.

> Semantics of output commit
> --------------------------
>
>                 Key: TEZ-728
>                 URL: https://issues.apache.org/jira/browse/TEZ-728
>             Project: Apache Tez
>          Issue Type: Task
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-728.1.patch, TEZ-728.2.patch
>
>
> Currently, vertices commit outputs when they succeed. However, if the job 
> fails then these outputs are not aborted.
> After speaking to Pig and Hive folks, both allow optional partial visibility 
> semantics. So if there are 2 vertices writing output and one of them (A) 
> passes and the other fails. Based on a user flag, Pig and Hive allow the 
> partial output of vertex A to be visible or not. So we need to support 
> 1) DAG fails - no output is visible
> 2) DAG fails - partial output is visible
> In order to support this, we could move output commit to DAG completion. If 
> the DAG succeeds, commit will be called on all output committers. If the DAG 
> fails, then abort will be called on all output committers. Optionally, if the 
> DAG fails then commit will be called on all successful vertices and abort 
> will be called on all failed vertices.
> This will also help the case when multiple vertices are writing to the same 
> output (union store). The DAG can call commit once on that output and ensure 
> correct commit semantics according to the commit API.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to