[ 
https://issues.apache.org/jira/browse/TEZ-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879095#comment-13879095
 ] 

Siddharth Seth commented on TEZ-678:
------------------------------------

Looks like we're effectively grouping equivalent or similar vertices together 
as a convenience for users, after which they can define similar operations on 
all of these vertices as a group rather than having to set them up individually.

Instead of AliasVertex extending Vertex - this could just be a separate 
construct in itself (something like VertexGroup).  I'm not sure an AliasVertex 
itself fits very well into a Graph - since it's not really a vertex. Having a 
separate construct gets rid of this concern. Also, it gets rid of all the 
additional methods on a Vertex which don't apply to a VertexGroup. Since this 
is getting converted into a helper - it should be fairly clear from the API 
itself, that this vertex group doesn't have a physical representation, cannot 
be monitored individually etc.  Edges could be setup between Vertices, or 
between a VertexGroup and a Vertex.

On InputDescriptor associated with an AliasVertex: I'm assuming an AliasVertex 
could potentially generate multiple outputs - which can be linked to different 
downstream vertices via different edges. Associating an InputDescriptor with 
the vertex itself won't allow this. Unless I'm missing something, to achieve 
something like this, users would have to setup multiple Aliases/Groups for the 
same set of Vertices (Can a vertex belong to multiple Aliases ?). If this were 
associated with the edge itself (which is where Input/OutputDescriptors are 
defined) - it should be possible to use the same alias/group for different 
Outputs and Edges generated by the same set of vertices. Something like 
addEdge(VertexGroup, Vertex, EdgeProperty, GroupInputDescriptor)

Nit: When adding an AliasVertex/GroupedVertex to a DAG (whether this is via 
addVertex or addVertexGroup) - I don't think users should need to add the 
individual vertices separately.

Output handling - was expecting users would be able to specify a single 
committer which would run once for all vertices in the group, rather than each 
vertex running a committer. Currently the output semantics just ends up 
creating a group of committers which will always be executed together. If we 
didn't have semantics to commit early - this wouldn't even be required ?

Will probably have some more comments on the patch itself as I go through it in 
detail - is rather big!

> Support for union operations
> ----------------------------
>
>                 Key: TEZ-678
>                 URL: https://issues.apache.org/jira/browse/TEZ-678
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>         Attachments: TEZ-678.1.patch, TEZ-678.2.patch, TEZ-678.3.patch, 
> TEZ-678.4.patch, TEZ-678.5.patch
>
>
> Unions represent a collection of results obtained from different branches of 
> computation. The collection is a virtual operation that does not need to 
> execute any tasks. Subsequent operations can conveniently work on the union 
> named data set instead of each individual member of the union. While unions 
> can be implemented efficiently without additional support from Tez, having 
> API support can make it easier and less error-prone to implement.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to