[ 
https://issues.apache.org/jira/browse/PIG-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394999#comment-14394999
 ] 

Rohini Palaniswamy commented on PIG-4495:
-----------------------------------------

I also realized that we can do it with single edge for Self joins as well. Both 
the splits can write to same edge of vertex. We will have to modify 
POShuffleTezLoad or have another one that checks the index and separates it 
into two bags before sending to the Packager similar to what is done in MR on 
the successor vertex. Need to check self CROSS though. 

> Better multi-query planning in case of union and multiple edges
> ---------------------------------------------------------------
>
>                 Key: PIG-4495
>                 URL: https://issues.apache.org/jira/browse/PIG-4495
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>    Affects Versions: 0.14.0
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.15.0
>
>
> Details in 
> https://issues.apache.org/jira/browse/TEZ-1190?focusedCommentId=14393033&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393033
> People split the data, perform some foreach transformations/filter, union 
> them and then do some operation like group by or join with other data. In 
> those cases it creates multiple edges from same Split, so we do not merge 
> them, but  
> write out the data to another dummy vertex to avoid multiple edges and this 
> adds overhead and affects performance. Vertex groups accept multiple edges 
> from same vertex. So if the multiple edges end up in a vertex group (and not 
> a vertex which is the case in self join) we can avoid the dummy vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to