Rohini Palaniswamy created PIG-3876:
---------------------------------------

             Summary: Handle two outputs from split going to same input in 
MultiQueryOptimizer
                 Key: PIG-3876
                 URL: https://issues.apache.org/jira/browse/PIG-3876
             Project: Pig
          Issue Type: Sub-task
            Reporter: Rohini Palaniswamy
             Fix For: tez-branch


MultiQueryOptimizerTez.java

{code}
// Detect diamond shape, we cannot merge it into split, since Tez
                // does not handle double edge between vertexes
                boolean sharedSucc = false;
                if (getPlan().getSuccessors(successor)!=null) {
                    for (TezOperator succ_successor : 
getPlan().getSuccessors(successor)) {
                        if (succ_successors.contains(succ_successor)) {
                            sharedSucc = true;
                            break;
                        }
                    }
                    succ_successors.addAll(getPlan().getSuccessors(successor));
                }
{code}

SPLIT A INTO B if <condition>, C if <condition>;
D = JOIN B by x, C by x;

We would like to do 
V1 - Split (B -> V2, C -> V2)
V2 - Join B and C

Without the check for shared successors, above plan is created but B and C 
create two separate edges between V1 and V2 which is not supported by Tez. 
Since the splits are not merged into POSplit fully, we currently have

V1 - Split ( B-> V2, C-> V3 with just POValueOutputTez)
V2 -  LocalRearrange and -> V4
V3 -  LocalRearrange and -> V4
V4 - Join B and C

 We need to remove the check and merge them into the POSplit and fix this case 
to make B and C both write to same edge. Being more aggressive in multi-query 
increases performance.

 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to