[GitHub] [druid] gianm commented on a change in pull request #9773: Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

GitBox Wed, 06 May 2020 13:09:20 -0700


gianm commented on a change in pull request #9773:
URL: https://github.com/apache/druid/pull/9773#discussion_r421051659




##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -8346,6 +8346,166 @@ public void testLeftJoinTwoLookupsUsingJoinOperator() 
throws Exception
     );
   }
 
+  @Test
+  public void testInnerJoinTableLookupLookupWithFilterWithOuterLimit() throws 
Exception
+  {
+    testQuery(
+        "SELECT dim1\n"
+        + "FROM foo\n"
+        + "INNER JOIN lookup.lookyloo l ON foo.dim2 = l.k\n"
+        + "INNER JOIN lookup.lookyloo l2 ON foo.dim2 = l2.k\n"
+        + "WHERE l.v = 'xa'\n"
+        + "LIMIT 100\n",
+        ImmutableList.of(
+            newScanQueryBuilder()
+                .dataSource(
+                    join(
+                        join(
+                            new TableDataSource(CalciteTests.DATASOURCE1),
+                            new LookupDataSource("lookyloo"),
+                            "j0.",
+                            
equalsCondition(DruidExpression.fromColumn("dim2"), 
DruidExpression.fromColumn("j0.k")),
+                            JoinType.INNER
+                        ),
+                        new LookupDataSource("lookyloo"),
+                        "_j0.",
+                        equalsCondition(DruidExpression.fromColumn("dim2"), 
DruidExpression.fromColumn("_j0.k")),
+                        JoinType.INNER
+                    )
+                )
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .limit(100)
+                .filters(selector("j0.v", "xa", null))
+                .columns("dim1")
+                .context(QUERY_CONTEXT_DEFAULT)
+                .build()
+        ),
+        ImmutableList.of(
+            new Object[]{""},
+            new Object[]{"1"}
+        )
+    );
+  }
+
+  @Test
+  public void testInnerJoinTableLookupLookupWithFilterWithoutLimit() throws 
Exception
+  {
+    testQuery(

Review comment:
       Could you also add a super-long version of this (just in case)? Such as:
   
   ```java
           "SELECT dim1\n"
           + "FROM foo\n"
           + "INNER JOIN lookup.lookyloo l ON foo.dim2 = l.k\n"
           + "INNER JOIN lookup.lookyloo l2 ON foo.dim2 = l2.k\n"
           + "INNER JOIN lookup.lookyloo l3 ON foo.dim2 = l3.k\n"
           + "INNER JOIN lookup.lookyloo l4 ON foo.dim2 = l4.k\n"
           + "INNER JOIN lookup.lookyloo l5 ON foo.dim2 = l5.k\n"
           + "INNER JOIN lookup.lookyloo l6 ON foo.dim2 = l6.k\n"
           + "INNER JOIN lookup.lookyloo l7 ON foo.dim2 = l7.k\n"
           + "INNER JOIN lookup.lookyloo l8 ON foo.dim2 = l8.k\n"
           + "INNER JOIN lookup.lookyloo l9 ON foo.dim2 = l9.k\n"
           + "INNER JOIN lookup.lookyloo l10 ON foo.dim2 = l10.k\n"
           + "INNER JOIN lookup.lookyloo l11 ON foo.dim2 = l11.k\n"
           + "INNER JOIN lookup.lookyloo l12 ON foo.dim2 = l12.k\n"
           + "INNER JOIN lookup.lookyloo l13 ON foo.dim2 = l13.k\n"
           + "INNER JOIN lookup.lookyloo l14 ON foo.dim2 = l14.k\n"
           + "INNER JOIN lookup.lookyloo l15 ON foo.dim2 = l15.k\n"
           + "INNER JOIN lookup.lookyloo l16 ON foo.dim2 = l16.k\n"
           + "INNER JOIN lookup.lookyloo l17 ON foo.dim2 = l17.k\n"
           + "INNER JOIN lookup.lookyloo l18 ON foo.dim2 = l18.k\n"
           + "INNER JOIN lookup.lookyloo l19 ON foo.dim2 = l19.k\n"
           + "WHERE l.v = 'xa'\n",
   ```
   
   The reason I'm suggesting this is we want to make sure the approach we're 
using doesn't change behavior as the stack gets deeper.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQueryRel.java
##########
@@ -183,6 +183,10 @@ public RelWriter explainTerms(final RelWriter pw)
   @Override
   public RelOptCost computeSelfCost(final RelOptPlanner planner, final 
RelMetadataQuery mq)
   {
-    return planner.getCostFactory().makeCost(partialQuery.estimateCost(), 0, 
0);
+    double cost = partialQuery.estimateCost();
+    double multiplier = DruidRels.isScanOrMapping(this, true)

Review comment:
       Could you add a comment about why this is being done here?

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidJoinQueryRel.java
##########
@@ -293,23 +293,17 @@ protected RelDataType deriveRowType()
   @Override
   public RelOptCost computeSelfCost(final RelOptPlanner planner, final 
RelMetadataQuery mq)
   {
-    double cost;
-
-    if (computeLeftRequiresSubquery(getSomeDruidChild(left))) {
-      cost = CostEstimates.COST_JOIN_SUBQUERY;
-    } else {
-      cost = partialQuery.estimateCost();
-    }
-
-    if (computeRightRequiresSubquery(getSomeDruidChild(right))) {
-      cost += CostEstimates.COST_JOIN_SUBQUERY;
-    }
-
+    double cost = partialQuery.estimateCost();
     if (joinRel.getCondition().isA(SqlKind.LITERAL) && 
!joinRel.getCondition().isAlwaysFalse()) {
       cost += CostEstimates.COST_JOIN_CROSS;
     }
-
-    return planner.getCostFactory().makeCost(cost, 0, 0);
+    // This is to cancel out the MULTIPLIER_FILTER (value=0.1) from 
partialQuery.estimateCost() to discourage
+    // filter push down if pushing down the filter makes this 
DruidJoinQueryRel not a scan or mapping.
+    // This will leave the filter at the topmost DruidJoinQueryRel (due to the 
order of applying/popping rules).

Review comment:
       This "due to the order of applying/popping rules" sounds a bit sketchy. 
Does it mean that a good plan and bad plan will have equal cost, and we'll pick 
the good plan due to order-based tiebreaking? If so, I'm not sure it would be 
safe to rely on that. Ideally the good plans should be cheaper than the bad 
plans, which ensures we pick them.

##########
File path: 
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidJoinQueryRel.java
##########
@@ -293,23 +293,17 @@ protected RelDataType deriveRowType()
   @Override
   public RelOptCost computeSelfCost(final RelOptPlanner planner, final 
RelMetadataQuery mq)
   {
-    double cost;
-
-    if (computeLeftRequiresSubquery(getSomeDruidChild(left))) {
-      cost = CostEstimates.COST_JOIN_SUBQUERY;
-    } else {
-      cost = partialQuery.estimateCost();
-    }
-
-    if (computeRightRequiresSubquery(getSomeDruidChild(right))) {
-      cost += CostEstimates.COST_JOIN_SUBQUERY;
-    }
-
+    double cost = partialQuery.estimateCost();
     if (joinRel.getCondition().isA(SqlKind.LITERAL) && 
!joinRel.getCondition().isAlwaysFalse()) {
       cost += CostEstimates.COST_JOIN_CROSS;
     }
-
-    return planner.getCostFactory().makeCost(cost, 0, 0);
+    // This is to cancel out the MULTIPLIER_FILTER (value=0.1) from 
partialQuery.estimateCost() to discourage
+    // filter push down if pushing down the filter makes this 
DruidJoinQueryRel not a scan or mapping.
+    // This will leave the filter at the topmost DruidJoinQueryRel (due to the 
order of applying/popping rules).
+    double multiplier = DruidRels.isScanOrMapping(this, true)
+                        ? 1
+                        : 1 / CostEstimates.MULTIPLIER_FILTER;

Review comment:
       It seems like this happens even if there is no filter. Is that right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on a change in pull request #9773: Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

Reply via email to