[
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282355#comment-14282355
]
Jeff Zhang commented on TEZ-391:
--------------------------------
Attach patch for SharedEdge
* Add a new api in Edge to create shared edge
{code}
public Edge createSharedEdge(Vertex outputVertex)
{code}
* Currently it only support One-to-One and Broadcast (ScatterGather require the
2 downstream vertices has the same parallelism, otherwise shuffle will break.
Although I did some change to make the ScatterGather work, but it still need
more work, especially on the reducer auto-parallelism)
* Add one example in tez-example to show the usage. (SharedEdgeExample)
Although this patch works, after more thinking, I think using VertexGroup may
be more natural and easy to understand. (We just need to make the 2 downstream
vertices as a vertex group and connect the upstream vertex with this vertex
group) VertexGroup is now used for shared output, it is also natural to make
it support for shared input. I will attach a new patch by using VertexGroup
later.
> SharedEdge - Support for passing same output from a vertex as input to two
> different vertices
> ---------------------------------------------------------------------------------------------
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Rohini Palaniswamy
> Assignee: Jeff Zhang
> Attachments: TEZ-391-WIP-1.patch
>
>
> We need this for lot of usecases. For cases where multi-query is turned off
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and
> we write the output multiple times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)