[ 
https://issues.apache.org/jira/browse/PIG-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3876:
------------------------------------

    Description: 
MultiQueryOptimizerTez.java

{code}
// Detect diamond shape, we cannot merge it into split, since Tez
                // does not handle double edge between vertexes
                boolean sharedSucc = false;
                if (getPlan().getSuccessors(successor)!=null) {
                    for (TezOperator succ_successor : 
getPlan().getSuccessors(successor)) {
                        if (succ_successors.contains(succ_successor)) {
                            sharedSucc = true;
                            break;
                        }
                    }
                    succ_successors.addAll(getPlan().getSuccessors(successor));
                }
{code}

SPLIT A INTO B if <condition>, C if <condition>;
D = JOIN B by x, C by x;

We would like to do 
V1 - Split (B -> V2, C -> V2)
V2 - Join B and C

Without the check for shared successors, above plan is created but B and C 
create two separate edges between V1 and V2 which is not supported by Tez. 
Since the splits are not merged into POSplit fully, we currently have

V1 - Split ( B-> V3, C-> V2 with just POValueOutputTez)
V2 -  LocalRearrange and -> V3
V3 - Join B and C

 We need to remove the check and merge them into the POSplit and fix this case 
to make B and C both write to same edge. Being more aggressive in multi-query 
increases performance.

 

  was:
MultiQueryOptimizerTez.java

{code}
// Detect diamond shape, we cannot merge it into split, since Tez
                // does not handle double edge between vertexes
                boolean sharedSucc = false;
                if (getPlan().getSuccessors(successor)!=null) {
                    for (TezOperator succ_successor : 
getPlan().getSuccessors(successor)) {
                        if (succ_successors.contains(succ_successor)) {
                            sharedSucc = true;
                            break;
                        }
                    }
                    succ_successors.addAll(getPlan().getSuccessors(successor));
                }
{code}

SPLIT A INTO B if <condition>, C if <condition>;
D = JOIN B by x, C by x;

We would like to do 
V1 - Split (B -> V2, C -> V2)
V2 - Join B and C

Without the check for shared successors, above plan is created but B and C 
create two separate edges between V1 and V2 which is not supported by Tez. 
Since the splits are not merged into POSplit fully, we currently have

V1 - Split ( B-> V2, C-> V3 with just POValueOutputTez)
V2 -  LocalRearrange and -> V4
V3 -  LocalRearrange and -> V4
V4 - Join B and C

 We need to remove the check and merge them into the POSplit and fix this case 
to make B and C both write to same edge. Being more aggressive in multi-query 
increases performance.

 


> Handle two outputs from split going to same input in MultiQueryOptimizer
> ------------------------------------------------------------------------
>
>                 Key: PIG-3876
>                 URL: https://issues.apache.org/jira/browse/PIG-3876
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>             Fix For: tez-branch
>
>
> MultiQueryOptimizerTez.java
> {code}
> // Detect diamond shape, we cannot merge it into split, since Tez
>                 // does not handle double edge between vertexes
>                 boolean sharedSucc = false;
>                 if (getPlan().getSuccessors(successor)!=null) {
>                     for (TezOperator succ_successor : 
> getPlan().getSuccessors(successor)) {
>                         if (succ_successors.contains(succ_successor)) {
>                             sharedSucc = true;
>                             break;
>                         }
>                     }
>                     
> succ_successors.addAll(getPlan().getSuccessors(successor));
>                 }
> {code}
> SPLIT A INTO B if <condition>, C if <condition>;
> D = JOIN B by x, C by x;
> We would like to do 
> V1 - Split (B -> V2, C -> V2)
> V2 - Join B and C
> Without the check for shared successors, above plan is created but B and C 
> create two separate edges between V1 and V2 which is not supported by Tez. 
> Since the splits are not merged into POSplit fully, we currently have
> V1 - Split ( B-> V3, C-> V2 with just POValueOutputTez)
> V2 -  LocalRearrange and -> V3
> V3 - Join B and C
>  We need to remove the check and merge them into the POSplit and fix this 
> case to make B and C both write to same edge. Being more aggressive in 
> multi-query increases performance.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to