Rohini Palaniswamy created PIG-3876: ---------------------------------------
Summary: Handle two outputs from split going to same input in MultiQueryOptimizer Key: PIG-3876 URL: https://issues.apache.org/jira/browse/PIG-3876 Project: Pig Issue Type: Sub-task Reporter: Rohini Palaniswamy Fix For: tez-branch MultiQueryOptimizerTez.java {code} // Detect diamond shape, we cannot merge it into split, since Tez // does not handle double edge between vertexes boolean sharedSucc = false; if (getPlan().getSuccessors(successor)!=null) { for (TezOperator succ_successor : getPlan().getSuccessors(successor)) { if (succ_successors.contains(succ_successor)) { sharedSucc = true; break; } } succ_successors.addAll(getPlan().getSuccessors(successor)); } {code} SPLIT A INTO B if <condition>, C if <condition>; D = JOIN B by x, C by x; We would like to do V1 - Split (B -> V2, C -> V2) V2 - Join B and C Without the check for shared successors, above plan is created but B and C create two separate edges between V1 and V2 which is not supported by Tez. Since the splits are not merged into POSplit fully, we currently have V1 - Split ( B-> V2, C-> V3 with just POValueOutputTez) V2 - LocalRearrange and -> V4 V3 - LocalRearrange and -> V4 V4 - Join B and C We need to remove the check and merge them into the POSplit and fix this case to make B and C both write to same edge. Being more aggressive in multi-query increases performance. -- This message was sent by Atlassian JIRA (v6.2#6252)