[
https://issues.apache.org/jira/browse/TEZ-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880537#comment-13880537
]
Siddharth Seth commented on TEZ-678:
------------------------------------
bq. AliasVertex extending Vertex is very convenient because its modeling a
virtual union vertex but with restrictions .....
It may be convenient, not sure it's correct though - is 'AliasVertex' really a
vertex, or just a convenience grouping of Vertices which require the same
operation. It doesn't actually form any part of the graph - the grouped
vertices however are hooked into the graph. The only part which is used from
Vertex is addOutputs - everything else is either new functionality (alias, ID)
or is not required / supported.
bq. A vertex can participate in multiple aliases. An alias can have multiple
outputs though I dont see any real use cases which would require that. Same for
multiple edges .....
I believe Hive can make use of this for multi-inserts. In terms of ease of use
- I'd definitely prefer creating a single group and hooking it up to multiple
edges, rather than have to create the same group multiple times over. If adding
a GroupedInput - that would only apply to edges which have a VertexGroup on
them. Convenience aside, doesn't the Input descriptor really belong to the edge
?
bq. Only a single output/committer is specified on the alias. So not sure what
you mean by multiple committers being specified. A single commit is executed at
runtime.
I must've read this incorrectly. The unit tested seemed to be looking for 2
invocations of commit.
Yep, we should get this in before 0.3.
> Support for union operations
> ----------------------------
>
> Key: TEZ-678
> URL: https://issues.apache.org/jira/browse/TEZ-678
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-678.1.patch, TEZ-678.2.patch, TEZ-678.3.patch,
> TEZ-678.4.patch, TEZ-678.5.patch
>
>
> Unions represent a collection of results obtained from different branches of
> computation. The collection is a virtual operation that does not need to
> execute any tasks. Subsequent operations can conveniently work on the union
> named data set instead of each individual member of the union. While unions
> can be implemented efficiently without additional support from Tez, having
> API support can make it easier and less error-prone to implement.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)