[ https://issues.apache.org/jira/browse/PIG-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy updated PIG-3876: ------------------------------------ Description: MultiQueryOptimizerTez.java {code} // Detect diamond shape, we cannot merge it into split, since Tez // does not handle double edge between vertexes boolean sharedSucc = false; if (getPlan().getSuccessors(successor)!=null) { for (TezOperator succ_successor : getPlan().getSuccessors(successor)) { if (succ_successors.contains(succ_successor)) { sharedSucc = true; break; } } succ_successors.addAll(getPlan().getSuccessors(successor)); } {code} SPLIT A INTO B if <condition>, C if <condition>; D = JOIN B by x, C by x; We would like to do V1 - Split (B -> V2, C -> V2) V2 - Join B and C Without the check for shared successors, above plan is created but B and C create two separate edges between V1 and V2 which is not supported by Tez. Since the splits are not merged into POSplit fully, we currently have V1 - Split ( B-> V3, C-> V2 with just POValueOutputTez) V2 - LocalRearrange and -> V3 V3 - Join B and C We need to remove the check and merge them into the POSplit and fix this case to make B and C both write to same edge. Being more aggressive in multi-query increases performance. was: MultiQueryOptimizerTez.java {code} // Detect diamond shape, we cannot merge it into split, since Tez // does not handle double edge between vertexes boolean sharedSucc = false; if (getPlan().getSuccessors(successor)!=null) { for (TezOperator succ_successor : getPlan().getSuccessors(successor)) { if (succ_successors.contains(succ_successor)) { sharedSucc = true; break; } } succ_successors.addAll(getPlan().getSuccessors(successor)); } {code} SPLIT A INTO B if <condition>, C if <condition>; D = JOIN B by x, C by x; We would like to do V1 - Split (B -> V2, C -> V2) V2 - Join B and C Without the check for shared successors, above plan is created but B and C create two separate edges between V1 and V2 which is not supported by Tez. Since the splits are not merged into POSplit fully, we currently have V1 - Split ( B-> V2, C-> V3 with just POValueOutputTez) V2 - LocalRearrange and -> V4 V3 - LocalRearrange and -> V4 V4 - Join B and C We need to remove the check and merge them into the POSplit and fix this case to make B and C both write to same edge. Being more aggressive in multi-query increases performance. > Handle two outputs from split going to same input in MultiQueryOptimizer > ------------------------------------------------------------------------ > > Key: PIG-3876 > URL: https://issues.apache.org/jira/browse/PIG-3876 > Project: Pig > Issue Type: Sub-task > Reporter: Rohini Palaniswamy > Fix For: tez-branch > > > MultiQueryOptimizerTez.java > {code} > // Detect diamond shape, we cannot merge it into split, since Tez > // does not handle double edge between vertexes > boolean sharedSucc = false; > if (getPlan().getSuccessors(successor)!=null) { > for (TezOperator succ_successor : > getPlan().getSuccessors(successor)) { > if (succ_successors.contains(succ_successor)) { > sharedSucc = true; > break; > } > } > > succ_successors.addAll(getPlan().getSuccessors(successor)); > } > {code} > SPLIT A INTO B if <condition>, C if <condition>; > D = JOIN B by x, C by x; > We would like to do > V1 - Split (B -> V2, C -> V2) > V2 - Join B and C > Without the check for shared successors, above plan is created but B and C > create two separate edges between V1 and V2 which is not supported by Tez. > Since the splits are not merged into POSplit fully, we currently have > V1 - Split ( B-> V3, C-> V2 with just POValueOutputTez) > V2 - LocalRearrange and -> V3 > V3 - Join B and C > We need to remove the check and merge them into the POSplit and fix this > case to make B and C both write to same edge. Being more aggressive in > multi-query increases performance. > -- This message was sent by Atlassian JIRA (v6.2#6252)