[
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933041#comment-16933041
]
sridhar Reddy commented on BEAM-7049:
-------------------------------------
I did the tests with the following steps
# re-clone a new repo of Beam
# add UnionMergeRule
# change {{BeamCostModel.FACTORY}} to {{null}} at
[https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java#L116]
when using the query
{{SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
UNION ALL
SELECT 5}}
It didn't work as expected. SQLPlan was generated but BeamPlan was not
generated. Here is the condensed stack trace
-------------
Error while applying rule BeamUnionRule, args
[rel#45:LogicalUnion.NONE(input#0=RelSubset#42,input#1=RelSubset#44,all=true)]Error
while applying rule BeamUnionRule, args
[rel#45:LogicalUnion.NONE(input#0=RelSubset#42,input#1=RelSubset#44,all=true)]java.lang.RuntimeException:
Error while applying rule BeamUnionRule, args
[rel#45:LogicalUnion.NONE(input#0=RelSubset#42,input#1=RelSubset#44,all=true)]
at
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:235)
at
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:631)
Caused by: java.lang.RuntimeException: Error occurred while applying rule
BeamUnionRuleCaused by: java.lang.RuntimeException: Error occurred while
applying rule BeamUnionRule at
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:143)
at
org.apache.beam.repackaged.sql.org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
Caused by: java.lang.ClassCastException:
org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel cannot be cast to
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoCostCaused
by: java.lang.ClassCastException:
org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel cannot be cast to
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoCost at
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoCost.isLt(VolcanoCost.java:112)
at
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:930)
at
org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:347)
---------
The same result is observed for the following query also (no union all)
SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5
However, just using the fresh clone without modifications "union all" query as
expected and sends 5 inputs but "union" query only sends 3 inputs. This can
also be observed in shortened BEAMPlan
INFO: BEAMPlan>
BeamUnionRel(all=[true])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[1], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[2], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[3], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[4], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[5], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
vs
INFO: BEAMPlan>
BeamUnionRel(all=[false])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[1], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[2], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
BeamCalcRel(expr#0=[\{inputs}], expr#1=[3], EXPR$0=[$t1])
BeamValuesRel(tuples=[[\{ 0 }]])
> Merge multiple input to one BeamUnionRel
> ----------------------------------------
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
> Issue Type: Improvement
> Components: dsl-sql
> Reporter: Rui Wang
> Assignee: sridhar Reddy
> Priority: Major
> Time Spent: 1h
> Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c`
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle
--
This message was sent by Atlassian Jira
(v8.3.4#803005)