Re: [PR] SQL: Plan non-equijoin conditions as cross join followed by filter (druid)

via GitHub Tue, 28 Nov 2023 02:09:33 -0800


clintropolis commented on code in PR #15302:
URL: https://github.com/apache/druid/pull/15302#discussion_r1407509515



##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java:
##########
@@ -397,15 +297,26 @@ static class ConditionAnalysis
      */
     private final List<RexLiteral> literalSubConditions;
 
+    /**
+     * Sub-conditions that cannot be handled by the DruidJoinRule.
+     */
+    private final List<RexNode> unsupportedSubConditions;

Review Comment:
   i can't help but wonder if there is a better name for this collection of 
stuff, 'unsupported' feels a bit off because we don't completely give up if 
there is some stuff in here... but i'm having trouble of thinking of what might 
be better
   
   maybe something about 'non-native' or ... 'indirect' .. or idk naming is the 
worst



##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java:
##########
@@ -222,164 +233,53 @@ private static RexNode makeNullableIfLiteral(final 
RexNode rexNode, final RexBui
   }
 
   /**
-   * Returns whether {@link #analyzeCondition} would return something.
+   * Returns whether we can handle the join condition. In case, some 
conditions in an AND expression are not supported,
+   * they are extracted into a post-join filter instead.
    */
   @VisibleForTesting
-  boolean canHandleCondition(final RexNode condition, final RelDataType 
leftRowType, DruidRel<?> right, final RexBuilder rexBuilder)
-  {
-    return analyzeCondition(condition, leftRowType, right, 
rexBuilder).isPresent();
-  }
-
-  /**
-   * If this condition is an AND of some combination of (1) literals; (2) 
equality conditions of the form
-   * {@code f(LeftRel) = RightColumn}, then return a {@link ConditionAnalysis}.
-   */
-  private Optional<ConditionAnalysis> analyzeCondition(
+  public boolean canHandleCondition(
       final RexNode condition,
       final RelDataType leftRowType,
-      final DruidRel<?> right,
+      DruidRel<?> right,
+      JoinRelType joinType,
+      List<RelDataTypeField> systemFieldList,
       final RexBuilder rexBuilder
   )
   {
-    final List<RexNode> subConditions = decomposeAnd(condition);
-    final List<RexEquality> equalitySubConditions = new ArrayList<>();
-    final List<RexLiteral> literalSubConditions = new ArrayList<>();
-    final int numLeftFields = leftRowType.getFieldCount();
-    final Set<RexInputRef> rightColumns = new HashSet<>();
-
-    for (RexNode subCondition : subConditions) {
-      if (RexUtil.isLiteral(subCondition, true)) {
-        if (subCondition.isA(SqlKind.CAST)) {
-          // This is CAST(literal) which is always OK.
-          // We know that this is CAST(literal) as it passed the check from 
RexUtil.isLiteral
-          RexCall call = (RexCall) subCondition;
-          // We have to verify the types of the cast here, because if the 
underlying literal and the cast output type
-          // are different, then skipping the cast might change the meaning of 
the subcondition.
-          if 
(call.getType().getSqlTypeName().equals(call.getOperands().get(0).getType().getSqlTypeName()))
 {
-            // If the types are the same, unwrap the cast and use the 
underlying literal.
-            literalSubConditions.add((RexLiteral) call.getOperands().get(0));
-          } else {
-            // If the types are not the same, return Optional.empty() 
indicating the condition is not supported.
-            return Optional.empty();
-          }
-        } else {
-          // Literals are always OK.
-          literalSubConditions.add((RexLiteral) subCondition);
-        }
-        continue;
-      }
-
-      RexNode firstOperand;
-      RexNode secondOperand;
-      SqlKind comparisonKind;
-
-      if (subCondition.isA(SqlKind.INPUT_REF)) {
-        firstOperand = rexBuilder.makeLiteral(true);
-        secondOperand = subCondition;
-        comparisonKind = SqlKind.EQUALS;
-
-        if 
(!SqlTypeName.BOOLEAN_TYPES.contains(secondOperand.getType().getSqlTypeName())) 
{
-          plannerContext.setPlanningError(
-              "SQL requires a join with '%s' condition where the column is of 
the type %s, that is not supported",
-              subCondition.getKind(),
-              secondOperand.getType().getSqlTypeName()
-          );
-          return Optional.empty();
-
-        }
-      } else if (subCondition.isA(SqlKind.EQUALS) || 
subCondition.isA(SqlKind.IS_NOT_DISTINCT_FROM)) {
-        final List<RexNode> operands = ((RexCall) subCondition).getOperands();
-        Preconditions.checkState(operands.size() == 2, "Expected 2 operands, 
got[%s]", operands.size());
-        firstOperand = operands.get(0);
-        secondOperand = operands.get(1);
-        comparisonKind = subCondition.getKind();
-      } else {
-        // If it's not EQUALS or a BOOLEAN input ref, it's not supported.
-        plannerContext.setPlanningError(
-            "SQL requires a join with '%s' condition that is not supported.",
-            subCondition.getKind()
-        );
-        return Optional.empty();
-      }
-
-      if (isLeftExpression(firstOperand, numLeftFields) && 
isRightInputRef(secondOperand, numLeftFields)) {
-        equalitySubConditions.add(new RexEquality(firstOperand, (RexInputRef) 
secondOperand, comparisonKind));
-        rightColumns.add((RexInputRef) secondOperand);
-      } else if (isRightInputRef(firstOperand, numLeftFields)
-                 && isLeftExpression(secondOperand, numLeftFields)) {
-        equalitySubConditions.add(new RexEquality(secondOperand, (RexInputRef) 
firstOperand, subCondition.getKind()));
-        rightColumns.add((RexInputRef) firstOperand);
-      } else {
-        // Cannot handle this condition.
-        plannerContext.setPlanningError("SQL is resulting in a join that has 
unsupported operand types.");
-        return Optional.empty();
-      }
-    }
-
+    ConditionAnalysis conditionAnalysis = analyzeCondition(condition, 
leftRowType, rexBuilder);
     // if the right side requires a subquery, then even lookup will be 
transformed to a QueryDataSource
     // thereby allowing join conditions on both k and v columns of the lookup
     if (right != null
         && !DruidJoinQueryRel.computeRightRequiresSubquery(plannerContext, 
DruidJoinQueryRel.getSomeDruidChild(right))
         && right instanceof DruidQueryRel) {
       DruidQueryRel druidQueryRel = (DruidQueryRel) right;
       if (druidQueryRel.getDruidTable().getDataSource() instanceof 
LookupDataSource) {
-        long distinctRightColumns = 
rightColumns.stream().map(RexSlot::getIndex).distinct().count();
+        long distinctRightColumns = 
conditionAnalysis.rightColumns.stream().map(RexSlot::getIndex).distinct().count();
         if (distinctRightColumns > 1) {
           // it means that the join's right side is lookup and the join 
condition contains both key and value columns of lookup.
           // currently, the lookup datasource in the native engine doesn't 
support using value column in the join condition.
           plannerContext.setPlanningError(
               "SQL is resulting in a join involving lookup where value column 
is used in the condition.");
-          return Optional.empty();
+          return false;
         }
       }
     }
 
-    return Optional.of(
-        new ConditionAnalysis(
-            numLeftFields,
-            equalitySubConditions,
-            literalSubConditions
-        )
-    );
-  }
+    if (joinType != JoinRelType.INNER || !systemFieldList.isEmpty() || 
NullHandling.replaceWithDefault()) {
+      // I am not sure in what case, the list of system fields will be not 
empty. I have just picked up this logic
+      // directly from 
https://github.com/apache/calcite/blob/calcite-1.35.0/core/src/main/java/org/apache/calcite/rel/rules/AbstractJoinExtractFilterRule.java#L58

Review Comment:
   what is systemFieldList? i dug around a bit and isnt super obvious to me 🙃 
   
   ```
      * @param systemFieldList  List of system fields that will be prefixed to
      *                         output row type; typically empty but must not be
      *                         null
    ``` 
   or `Returns a list of system fields that will be prefixed to output row 
type.` are all that i can find but i don't quite fully understand what it means



##########
docs/querying/datasource.md:
##########
@@ -335,21 +339,23 @@ SQL joins take the form:
 <o1> [ INNER | LEFT [OUTER] ] JOIN <o2> ON <condition>
 ```
 
-The condition must involve only equalities, but functions are okay, and there 
can be multiple equalities ANDed together.
-Conditions like `t1.x = t2.x`, or `LOWER(t1.x) = t2.x`, or `t1.x = t2.x AND 
t1.y = t2.y` can all be handled. Conditions
-like `t1.x <> t2.x` cannot currently be handled.
+Any condition is accepted, but only certain kinds of conditions execute as 
part of a native join. To execute efficiently
+as part of a native join, a condition must be a single clause like the 
following, or an `AND` of clauses involving at 
+lease one of the following:

Review Comment:
   ```suggestion
   Any condition is accepted, but only certain kinds of conditions execute 
efficiently
   as a native join. The condition must be a single clause like the following, 
or an `AND` of clauses involving at 
   least one of the following:
   ```
   
   though the "or an `AND`" part still reads not super smoothly but i don't 
have any better ideas 😅 



##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java:
##########
@@ -222,164 +233,53 @@ private static RexNode makeNullableIfLiteral(final 
RexNode rexNode, final RexBui
   }
 
   /**
-   * Returns whether {@link #analyzeCondition} would return something.
+   * Returns whether we can handle the join condition. In case, some 
conditions in an AND expression are not supported,
+   * they are extracted into a post-join filter instead.
    */
   @VisibleForTesting
-  boolean canHandleCondition(final RexNode condition, final RelDataType 
leftRowType, DruidRel<?> right, final RexBuilder rexBuilder)
-  {
-    return analyzeCondition(condition, leftRowType, right, 
rexBuilder).isPresent();
-  }
-
-  /**
-   * If this condition is an AND of some combination of (1) literals; (2) 
equality conditions of the form
-   * {@code f(LeftRel) = RightColumn}, then return a {@link ConditionAnalysis}.
-   */
-  private Optional<ConditionAnalysis> analyzeCondition(
+  public boolean canHandleCondition(
       final RexNode condition,
       final RelDataType leftRowType,
-      final DruidRel<?> right,
+      DruidRel<?> right,
+      JoinRelType joinType,
+      List<RelDataTypeField> systemFieldList,
       final RexBuilder rexBuilder
   )
   {
-    final List<RexNode> subConditions = decomposeAnd(condition);
-    final List<RexEquality> equalitySubConditions = new ArrayList<>();
-    final List<RexLiteral> literalSubConditions = new ArrayList<>();
-    final int numLeftFields = leftRowType.getFieldCount();
-    final Set<RexInputRef> rightColumns = new HashSet<>();
-
-    for (RexNode subCondition : subConditions) {
-      if (RexUtil.isLiteral(subCondition, true)) {
-        if (subCondition.isA(SqlKind.CAST)) {
-          // This is CAST(literal) which is always OK.
-          // We know that this is CAST(literal) as it passed the check from 
RexUtil.isLiteral
-          RexCall call = (RexCall) subCondition;
-          // We have to verify the types of the cast here, because if the 
underlying literal and the cast output type
-          // are different, then skipping the cast might change the meaning of 
the subcondition.
-          if 
(call.getType().getSqlTypeName().equals(call.getOperands().get(0).getType().getSqlTypeName()))
 {
-            // If the types are the same, unwrap the cast and use the 
underlying literal.
-            literalSubConditions.add((RexLiteral) call.getOperands().get(0));
-          } else {
-            // If the types are not the same, return Optional.empty() 
indicating the condition is not supported.
-            return Optional.empty();
-          }
-        } else {
-          // Literals are always OK.
-          literalSubConditions.add((RexLiteral) subCondition);
-        }
-        continue;
-      }
-
-      RexNode firstOperand;
-      RexNode secondOperand;
-      SqlKind comparisonKind;
-
-      if (subCondition.isA(SqlKind.INPUT_REF)) {
-        firstOperand = rexBuilder.makeLiteral(true);
-        secondOperand = subCondition;
-        comparisonKind = SqlKind.EQUALS;
-
-        if 
(!SqlTypeName.BOOLEAN_TYPES.contains(secondOperand.getType().getSqlTypeName())) 
{
-          plannerContext.setPlanningError(
-              "SQL requires a join with '%s' condition where the column is of 
the type %s, that is not supported",
-              subCondition.getKind(),
-              secondOperand.getType().getSqlTypeName()
-          );
-          return Optional.empty();
-
-        }
-      } else if (subCondition.isA(SqlKind.EQUALS) || 
subCondition.isA(SqlKind.IS_NOT_DISTINCT_FROM)) {
-        final List<RexNode> operands = ((RexCall) subCondition).getOperands();
-        Preconditions.checkState(operands.size() == 2, "Expected 2 operands, 
got[%s]", operands.size());
-        firstOperand = operands.get(0);
-        secondOperand = operands.get(1);
-        comparisonKind = subCondition.getKind();
-      } else {
-        // If it's not EQUALS or a BOOLEAN input ref, it's not supported.
-        plannerContext.setPlanningError(
-            "SQL requires a join with '%s' condition that is not supported.",
-            subCondition.getKind()
-        );
-        return Optional.empty();
-      }
-
-      if (isLeftExpression(firstOperand, numLeftFields) && 
isRightInputRef(secondOperand, numLeftFields)) {
-        equalitySubConditions.add(new RexEquality(firstOperand, (RexInputRef) 
secondOperand, comparisonKind));
-        rightColumns.add((RexInputRef) secondOperand);
-      } else if (isRightInputRef(firstOperand, numLeftFields)
-                 && isLeftExpression(secondOperand, numLeftFields)) {
-        equalitySubConditions.add(new RexEquality(secondOperand, (RexInputRef) 
firstOperand, subCondition.getKind()));
-        rightColumns.add((RexInputRef) firstOperand);
-      } else {
-        // Cannot handle this condition.
-        plannerContext.setPlanningError("SQL is resulting in a join that has 
unsupported operand types.");
-        return Optional.empty();
-      }
-    }
-
+    ConditionAnalysis conditionAnalysis = analyzeCondition(condition, 
leftRowType, rexBuilder);
     // if the right side requires a subquery, then even lookup will be 
transformed to a QueryDataSource
     // thereby allowing join conditions on both k and v columns of the lookup
     if (right != null
         && !DruidJoinQueryRel.computeRightRequiresSubquery(plannerContext, 
DruidJoinQueryRel.getSomeDruidChild(right))
         && right instanceof DruidQueryRel) {
       DruidQueryRel druidQueryRel = (DruidQueryRel) right;
       if (druidQueryRel.getDruidTable().getDataSource() instanceof 
LookupDataSource) {
-        long distinctRightColumns = 
rightColumns.stream().map(RexSlot::getIndex).distinct().count();
+        long distinctRightColumns = 
conditionAnalysis.rightColumns.stream().map(RexSlot::getIndex).distinct().count();
         if (distinctRightColumns > 1) {
           // it means that the join's right side is lookup and the join 
condition contains both key and value columns of lookup.
           // currently, the lookup datasource in the native engine doesn't 
support using value column in the join condition.
           plannerContext.setPlanningError(
               "SQL is resulting in a join involving lookup where value column 
is used in the condition.");
-          return Optional.empty();
+          return false;
         }
       }
     }
 
-    return Optional.of(
-        new ConditionAnalysis(
-            numLeftFields,
-            equalitySubConditions,
-            literalSubConditions
-        )
-    );
-  }
+    if (joinType != JoinRelType.INNER || !systemFieldList.isEmpty() || 
NullHandling.replaceWithDefault()) {
+      // I am not sure in what case, the list of system fields will be not 
empty. I have just picked up this logic
+      // directly from 
https://github.com/apache/calcite/blob/calcite-1.35.0/core/src/main/java/org/apache/calcite/rel/rules/AbstractJoinExtractFilterRule.java#L58

Review Comment:
   what is systemFieldList? i dug around a bit and isnt super obvious to me 🙃 
   
   ```
      * @param systemFieldList  List of system fields that will be prefixed to
      *                         output row type; typically empty but must not be
      *                         null
    ``` 
   or `Returns a list of system fields that will be prefixed to output row 
type.` are all that i can find but i don't quite fully understand what it means



##########
docs/querying/datasource.md:
##########
@@ -335,21 +339,23 @@ SQL joins take the form:
 <o1> [ INNER | LEFT [OUTER] ] JOIN <o2> ON <condition>
 ```
 
-The condition must involve only equalities, but functions are okay, and there 
can be multiple equalities ANDed together.
-Conditions like `t1.x = t2.x`, or `LOWER(t1.x) = t2.x`, or `t1.x = t2.x AND 
t1.y = t2.y` can all be handled. Conditions
-like `t1.x <> t2.x` cannot currently be handled.
+Any condition is accepted, but only certain kinds of conditions execute as 
part of a native join. To execute efficiently
+as part of a native join, a condition must be a single clause like the 
following, or an `AND` of clauses involving at 
+lease one of the following:

Review Comment:
   ```suggestion
   Any condition is accepted, but only certain kinds of conditions execute 
efficiently
   as a native join. The condition must be a single clause like the following, 
or an `AND` of clauses involving at 
   least one of the following:
   ```
   
   though the "or an `AND`" part still reads not super smoothly but i don't 
have any better ideas 😅 



##########
sql/src/test/java/org/apache/druid/sql/calcite/rule/DruidJoinRuleTest.java:
##########
@@ -85,7 +89,7 @@ public void test_canHandleCondition_leftEqRight()
             ),
             leftType,
             null,
-            rexBuilder
+            JoinRelType.INNER, ImmutableList.of(), rexBuilder

Review Comment:
   nit: args should be on newlines, same with a handful of other parts in this 
file (one of few things style autobots can't detect very well)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] SQL: Plan non-equijoin conditions as cross join followed by filter (druid)

Reply via email to