[ 
https://issues.apache.org/jira/browse/TEZ-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304395#comment-14304395
 ] 

Bikas Saha commented on TEZ-391:
--------------------------------

Thanks for the doc. It gives a good overall picture.

We should probably name it GroupOutputEdge to be symmetric with GroupInputEdge.

Some parts are not clear. E.g. a group input edge actually expands into a set 
of standard edges a long with the some metadata on the destination vertex that 
enables the AM to send additional info to the destination TezChild. TezChild 
uses the metadata to create a unified input that wraps around the member 
inputs. This provides a merged view on top of the real inputs and failure 
handling remains as is.
Is the design suggesting that a group output edge expand into standard edges 
with additional metadata at the source vertex which will enable its TezChild to 
provide a single output to its tasks even though there are multiple consumers?

The replicate event at TezChild vs keep it single needs some more thought. E.g. 
replication would increase event memory by replica times. What happens to fault 
tolerance? If a destination vertex reports an error about a shared source then 
what should happen in other destination vertices that are sharing that source? 
Related: When an output of a task is marked bad then it sends an InputFailed 
event to its destination tasks. This happens in the AM and needs to be sent to 
all destination tasks of a shared output. So the AM routing would need to take 
into account shared outputs for this case. The OutputReportedFailedTransition 
may need to be updated to consider the case the errors may be reported from 
multiple vertices with different task counts.

Shared output to a data sink was already covered in the jira that added 
GroupInputEdge. So we can skip that here.

Can it happen that a VertexGroup is connected to another VertexGroup? What use 
case would that be? Until now standard vertices would be inputs a VertexGroup. 
Shared edge will allow VertexGroups to be outputs to standard vertices.

> SharedEdge - Support for passing same output from a vertex as input to two 
> different vertices
> ---------------------------------------------------------------------------------------------
>
>                 Key: TEZ-391
>                 URL: https://issues.apache.org/jira/browse/TEZ-391
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>            Assignee: Jeff Zhang
>         Attachments: Shared Edge Design.pdf, TEZ-391-WIP-1.patch, 
> TEZ-391-WIP-2.patch, TEZ-391-WIP-3.patch
>
>
>   We need this for lot of usecases. For cases where multi-query is turned off 
> and for optimizing unions. Currently those are BROADCAST or ONE-ONE edges and 
> we write the output multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to