Hi,

  having some issue with MultiJoinOptimizeBushyRule i want to a question
(before spamming calcite's jira with bug tickets) -
 what can i expect from using the bushy optimizer ? Are there inputs that
it is known/expected to fail with ?

The story behind is we're having issues with MultiJoinOptimizeBushyRule
creating an invalid plan, optimizing a left outer join into a cross join
(and the execution thus producing invalid exploded results)

The query in question is a a simple 4 table query with one outer join.

select *
 from vehicle
 join model on model.A = vehicle.B
 join make on make.A = model.B
 left join motor ON motor.B = vehicle.A

MultiJoinOptimizeBushyRule turns it from

LogicalProject(A=[$0], B=[$1], C=[$2], A0=[$3], B0=[$4], C0=[$5], A1=[$6],
B1=[$7], A2=[$8], B2=[$9], C1=[$10]): rowcount = 1.0, cumulative cost =
11.0, id = 704
  LogicalJoin(condition=[=($9, $0)], joinType=[left]): rowcount = 1.0,
cumulative cost = 10.0, id = 702
    LogicalJoin(condition=[=($6, $4)], joinType=[inner]): rowcount = 1.0,
cumulative cost = 7.0, id = 699
      LogicalJoin(condition=[=($3, $1)], joinType=[inner]): rowcount = 1.0,
cumulative cost = 5.0, id = 696
        EnumerableTableScan(table=[[vehicle]]): rowcount = 3.0, cumulative
cost = 3.0, id = 686
        EnumerableTableScan(table=[[model]]): rowcount = 1.0, cumulative
cost = 1.0, id = 687
      EnumerableTableScan(table=[[make]]): rowcount = 1.0, cumulative cost
= 1.0, id = 689
    EnumerableTableScan(table=[[motor]]): rowcount = 2.0, cumulative cost =
2.0, id = 691

into having a cross join - EnumerableHashJoin(condition=[true],
joinType=[inner]) against the "motor" table

EnumerableProject(A=[$0], B=[$1], C=[$2], A0=[$3], B0=[$4], C0=[$5],
A1=[$6], B1=[$7], A2=[$8], B2=[$9], C1=[$10]): rowcount = 2.0, cumulative
cost = 22.6, id = 777
  EnumerableProject(A1=[$5], B1=[$6], C0=[$7], A=[$0], B=[$1], C=[$2],
A0=[$3], B0=[$4], A2=[$8], B2=[$9], C1=[$10]): rowcount = 2.0, cumulative
cost = 20.6, id = 776
    EnumerableHashJoin(condition=[true], joinType=[inner]): rowcount = 2.0,
cumulative cost = 18.6, id = 775
      EnumerableMergeJoin(condition=[=($0, $6)], joinType=[inner]):
rowcount = 1.0, cumulative cost = 12.15, id = 774
        EnumerableMergeJoin(condition=[=($1, $3)], joinType=[inner]):
rowcount = 1.0, cumulative cost = 5.0, id = 773
          EnumerableTableScan(table=[[model]]): rowcount = 1.0, cumulative
cost = 1.0, id = 687
          EnumerableTableScan(table=[[make]]): rowcount = 1.0, cumulative
cost = 1.0, id = 689
        EnumerableTableScan(table=[[vehicle]]): rowcount = 3.0, cumulative
cost = 3.0, id = 686
      EnumerableTableScan(table=[[motor]]): rowcount = 2.0, cumulative cost
= 2.0, id = 691

That seems like a bug. Or are there some constraints on what join types
does MultiJoinOptimizeBushyRule handle ?

I cloned the calcite repo in an attempt to create a failing test case for a
possible jira ticket, but im struggling with a simple introduction of a
left join into  PlannerTest#testBushy, not sure how this is related to my
problem:

diff --git a/core/src/test/java/org/apache/calcite/tools/PlannerTest.java
b/core/src/test/java/org/apache/calcite/tools/PlannerTest.java
index db9dc9e90..487e99b77 100644
--- a/core/src/test/java/org/apache/calcite/tools/PlannerTest.java
+++ b/core/src/test/java/org/apache/calcite/tools/PlannerTest.java
@@ -1046,7 +1046,7 @@ private void checkHeuristic(String sql, String
expected) throws Exception {
         + "  on s.\"customer_id\" = c.\"customer_id\"\n"
         + "join \"product\" as p\n"
         + "  on s.\"product_id\" = p.\"product_id\"\n"
-        + "join \"product_class\" as pc\n"
+        + "left join \"product_class\" as pc\n"
         + "  on p.\"product_class_id\" = pc.\"product_class_id\"\n"
         + "where c.\"city\" = 'San Francisco'\n"
         + "and p.\"brand_name\" = 'Washington'";

but this blows up in VolcanoPlanner:

java.lang.AssertionError: Type mismatch:
rel rowtype:
RecordType(INTEGER NOT NULL product_id0, INTEGER NOT NULL time_id, INTEGER
NOT NULL customer_id0, INTEGER NOT NULL promotion_id, INTEGER NOT NULL
store_id, DECIMAL(10, 4) NOT NULL store_sales, DECIMAL(10, 4) NOT NULL
store_cost, DECIMAL(10, 4) NOT NULL unit_sales, INTEGER NOT NULL
customer_id, BIGINT NOT NULL account_num, VARCHAR(30) NOT NULL lname,
VARCHAR(30) NOT NULL fname, VARCHAR(30) mi, VARCHAR(30) address1,
VARCHAR(30) address2, VARCHAR(30) address3, VARCHAR(30) address4,
VARCHAR(30) city, VARCHAR(30) state_province, VARCHAR(30) NOT NULL
postal_code, VARCHAR(30) NOT NULL country, INTEGER NOT NULL
customer_region_id, VARCHAR(30) NOT NULL phone1, VARCHAR(30) NOT NULL
phone2, DATE NOT NULL birthdate, VARCHAR(30) NOT NULL marital_status,
VARCHAR(30) NOT NULL yearly_income, VARCHAR(30) NOT NULL gender, SMALLINT
NOT NULL total_children, SMALLINT NOT NULL num_children_at_home,
VARCHAR(30) NOT NULL education, DATE NOT NULL date_accnt_opened,
VARCHAR(30) member_card, VARCHAR(30) occupation, VARCHAR(30) houseowner,
INTEGER num_cars_owned, VARCHAR(60) NOT NULL fullname, INTEGER NOT NULL
product_class_id0, INTEGER NOT NULL product_id, VARCHAR(60) brand_name,
VARCHAR(60) NOT NULL product_name, BIGINT NOT NULL SKU, DECIMAL(10, 4) SRP,
DOUBLE gross_weight, DOUBLE net_weight, BOOLEAN recyclable_package, BOOLEAN
low_fat, SMALLINT units_per_case, SMALLINT cases_per_pallet, DOUBLE
shelf_width, DOUBLE shelf_height, DOUBLE shelf_depth, INTEGER NOT NULL
product_class_id, VARCHAR(30) product_subcategory, VARCHAR(30)
product_category, VARCHAR(30) product_department, VARCHAR(30)
product_family) NOT NULL
equivRel rowtype:
RecordType(INTEGER NOT NULL product_id, INTEGER NOT NULL time_id, INTEGER
NOT NULL customer_id, INTEGER NOT NULL promotion_id, INTEGER NOT NULL
store_id, DECIMAL(10, 4) NOT NULL store_sales, DECIMAL(10, 4) NOT NULL
store_cost, DECIMAL(10, 4) NOT NULL unit_sales, INTEGER NOT NULL
customer_id0, BIGINT NOT NULL account_num, VARCHAR(30) NOT NULL lname,
VARCHAR(30) NOT NULL fname, VARCHAR(30) mi, VARCHAR(30) address1,
VARCHAR(30) address2, VARCHAR(30) address3, VARCHAR(30) address4,
VARCHAR(30) city, VARCHAR(30) state_province, VARCHAR(30) NOT NULL
postal_code, VARCHAR(30) NOT NULL country, INTEGER NOT NULL
customer_region_id, VARCHAR(30) NOT NULL phone1, VARCHAR(30) NOT NULL
phone2, DATE NOT NULL birthdate, VARCHAR(30) NOT NULL marital_status,
VARCHAR(30) NOT NULL yearly_income, VARCHAR(30) NOT NULL gender, SMALLINT
NOT NULL total_children, SMALLINT NOT NULL num_children_at_home,
VARCHAR(30) NOT NULL education, DATE NOT NULL date_accnt_opened,
VARCHAR(30) member_card, VARCHAR(30) occupation, VARCHAR(30) houseowner,
INTEGER num_cars_owned, VARCHAR(60) NOT NULL fullname, INTEGER NOT NULL
product_class_id, INTEGER NOT NULL product_id0, VARCHAR(60) brand_name,
VARCHAR(60) NOT NULL product_name, BIGINT NOT NULL SKU, DECIMAL(10, 4) SRP,
DOUBLE gross_weight, DOUBLE net_weight, BOOLEAN recyclable_package, BOOLEAN
low_fat, SMALLINT units_per_case, SMALLINT cases_per_pallet, DOUBLE
shelf_width, DOUBLE shelf_height, DOUBLE shelf_depth, INTEGER
product_class_id0, VARCHAR(30) product_subcategory, VARCHAR(30)
product_category, VARCHAR(30) product_department, VARCHAR(30)
product_family) NOT NULL

at org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:2026)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:838)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:868)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1927)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:129)
at
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
at
org.apache.calcite.rel.rules.MultiJoinOptimizeBushyRule.onMatch(MultiJoinOptimizeBushyRule.java:278)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:631)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:327)
at org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:346)
at
org.apache.calcite.tools.Programs.lambda$heuristicJoinOrder$1(Programs.java:233)
at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:357)
at org.apache.calcite.tools.PlannerTest.checkBushy(PlannerTest.java:1157)
at org.apache.calcite.tools.PlannerTest.testBushy(PlannerTest.java:1065)

Thank you,
Karel

Reply via email to