[
https://issues.apache.org/jira/browse/TEZ-678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887947#comment-13887947
]
Siddharth Seth commented on TEZ-678:
------------------------------------
bq. Vertices can be independently handled and while some operations may be
performed as part of a group. Vertices may belong to multiple groups. So adding
member vertices via DAG.addVertexGroup() does not make sense in general.
DAG.createVertexGroup(Vertices) is not a static method. It is called on the DAG
object to create a group of its vertices.
Static or non-static, I don't believe this needs to be on the DAG - it can just
be a new VertexGroup / AliasVertex (whatever it's now called).
Even if vertices belong to multiple groups, it should be fairly straightforward
for us to ensure that they are registered only once. The way I'd expect to use
this (in terms of simplicity / usability)
Vertices: v1, v2, v3. vg=VertexGroup(v2,v3). dag.addVertex(v1),
dag.addVertexGroup(v3). dag.addEdge(v1, vg ...).
We obviously disagree on what this API should like. If you think the above
usage makes sense - please add the changes, otherwise this can just go in as is.
bq. This is following the same logic as an existing transition.
checkStateForCompletion() returns whether further completions are expected or
not and so by seeing we decide if to wait in TERMINATING state or to
immediately go to a final state.
Maybe this needs to be looked at as part of a separate JIRA - I'll look at this
again when going through the patch - it may have been related only to the the
COMMIT_FAILURE TerminationCause.
bq. Did not quite get this. There is no such VertexState and COMMIT_FAILURE is
a termination reason for diagnostics.
That was meant to be TerminationCause - sending COMMIT_FAILURE as the
TerminationCause doesn't appear to be handled.
Will take a look at the patch in a bit.
> Support for union operations
> ----------------------------
>
> Key: TEZ-678
> URL: https://issues.apache.org/jira/browse/TEZ-678
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-678.1.patch, TEZ-678.2.patch, TEZ-678.3.patch,
> TEZ-678.4.patch, TEZ-678.5.patch, TEZ-678.6.patch, TEZ-678.7.patch,
> TEZ-678.8.patch
>
>
> Unions represent a collection of results obtained from different branches of
> computation. The collection is a virtual operation that does not need to
> execute any tasks. Subsequent operations can conveniently work on the union
> named data set instead of each individual member of the union. While unions
> can be implemented efficiently without additional support from Tez, having
> API support can make it easier and less error-prone to implement.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)