[
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304395#comment-14304395
]
Bikas Saha commented on TEZ-391:
--------------------------------
Thanks for the doc. It gives a good overall picture.
We should probably name it GroupOutputEdge to be symmetric with GroupInputEdge.
Some parts are not clear. E.g. a group input edge actually expands into a set
of standard edges a long with the some metadata on the destination vertex that
enables the AM to send additional info to the destination TezChild. TezChild
uses the metadata to create a unified input that wraps around the member
inputs. This provides a merged view on top of the real inputs and failure
handling remains as is.
Is the design suggesting that a group output edge expand into standard edges
with additional metadata at the source vertex which will enable its TezChild to
provide a single output to its tasks even though there are multiple consumers?
The replicate event at TezChild vs keep it single needs some more thought. E.g.
replication would increase event memory by replica times. What happens to fault
tolerance? If a destination vertex reports an error about a shared source then
what should happen in other destination vertices that are sharing that source?
Related: When an output of a task is marked bad then it sends an InputFailed
event to its destination tasks. This happens in the AM and needs to be sent to
all destination tasks of a shared output. So the AM routing would need to take
into account shared outputs for this case. The OutputReportedFailedTransition
may need to be updated to consider the case the errors may be reported from
multiple vertices with different task counts.
Shared output to a data sink was already covered in the jira that added
GroupInputEdge. So we can skip that here.
Can it happen that a VertexGroup is connected to another VertexGroup? What use
case would that be? Until now standard vertices would be inputs a VertexGroup.
Shared edge will allow VertexGroups to be outputs to standard vertices.
> SharedEdge - Support for passing same output from a vertex as input to two
> different vertices
> ---------------------------------------------------------------------------------------------
>
> Key: TEZ-391
> URL: https://issues.apache.org/jira/browse/TEZ-391
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Rohini Palaniswamy
> Assignee: Jeff Zhang
> Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch,
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch
>
>
> We need this for lot of usecases. For cases where multi-query is turned off
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and
> we write the output multiple times.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)