[ 
https://issues.apache.org/jira/browse/TEZ-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880124#comment-13880124
 ] 

Bikas Saha commented on TEZ-678:
--------------------------------

AliasVertex extending Vertex is very convenient because its modeling a virtual 
union vertex but with restrictions. I will think about how to not use 
dag.addVertex() for it. Perhaps, instead of Vertex.createAlias() we can do 
dag.addAlias() that create the alias and stores it internally.

A vertex can participate in multiple aliases. An alias can have multiple 
outputs though I dont see any real use cases which would require that. Same for 
multiple edges. I had initially added a check that aliases only have 1 outgoing 
edge but later removed it. Should probably add it back. I do not want to add a 
group descriptor for an edge because it does not make sense by itself. An alias 
needs such a grouper and can handle it within its special alias logic. If 
really needed we could enhance the alias API to later support different 
groupers. An alias is cheap to create. So it should be fine to create multiple 
aliases if needed.

Only a single output/committer is specified on the alias. So not sure what you 
mean by multiple committers being specified. A single commit is executed at 
runtime.

I tried to post incremental patches so help review but that didnt quite work 
out. Should have probably committed the incremental patches in separate jiras. 
Lets try to follow that going forward so that we dont have big patches like 
this. Would be great if we can land this in 0.3 so that hive can be unblocked 
for unions without having to wait for another release.

> Support for union operations
> ----------------------------
>
>                 Key: TEZ-678
>                 URL: https://issues.apache.org/jira/browse/TEZ-678
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-678.1.patch, TEZ-678.2.patch, TEZ-678.3.patch, 
> TEZ-678.4.patch, TEZ-678.5.patch
>
>
> Unions represent a collection of results obtained from different branches of 
> computation. The collection is a virtual operation that does not need to 
> execute any tasks. Subsequent operations can conveniently work on the union 
> named data set instead of each individual member of the union. While unions 
> can be implemented efficiently without additional support from Tez, having 
> API support can make it easier and less error-prone to implement.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to