clintropolis commented on code in PR #15302:
URL: https://github.com/apache/druid/pull/15302#discussion_r1407509515
##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java:
##########
@@ -397,15 +297,26 @@ static class ConditionAnalysis
*/
private final List<RexLiteral> literalSubConditions;
+ /**
+ * Sub-conditions that cannot be handled by the DruidJoinRule.
+ */
+ private final List<RexNode> unsupportedSubConditions;
Review Comment:
i can't help but wonder if there is a better name for this collection of
stuff, 'unsupported' feels a bit off because we don't completely give up if
there is some stuff in here... but i'm having trouble of thinking of what might
be better
maybe something about 'non-native' or ... 'indirect' .. or idk naming is the
worst
##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java:
##########
@@ -222,164 +233,53 @@ private static RexNode makeNullableIfLiteral(final
RexNode rexNode, final RexBui
}
/**
- * Returns whether {@link #analyzeCondition} would return something.
+ * Returns whether we can handle the join condition. In case, some
conditions in an AND expression are not supported,
+ * they are extracted into a post-join filter instead.
*/
@VisibleForTesting
- boolean canHandleCondition(final RexNode condition, final RelDataType
leftRowType, DruidRel<?> right, final RexBuilder rexBuilder)
- {
- return analyzeCondition(condition, leftRowType, right,
rexBuilder).isPresent();
- }
-
- /**
- * If this condition is an AND of some combination of (1) literals; (2)
equality conditions of the form
- * {@code f(LeftRel) = RightColumn}, then return a {@link ConditionAnalysis}.
- */
- private Optional<ConditionAnalysis> analyzeCondition(
+ public boolean canHandleCondition(
final RexNode condition,
final RelDataType leftRowType,
- final DruidRel<?> right,
+ DruidRel<?> right,
+ JoinRelType joinType,
+ List<RelDataTypeField> systemFieldList,
final RexBuilder rexBuilder
)
{
- final List<RexNode> subConditions = decomposeAnd(condition);
- final List<RexEquality> equalitySubConditions = new ArrayList<>();
- final List<RexLiteral> literalSubConditions = new ArrayList<>();
- final int numLeftFields = leftRowType.getFieldCount();
- final Set<RexInputRef> rightColumns = new HashSet<>();
-
- for (RexNode subCondition : subConditions) {
- if (RexUtil.isLiteral(subCondition, true)) {
- if (subCondition.isA(SqlKind.CAST)) {
- // This is CAST(literal) which is always OK.
- // We know that this is CAST(literal) as it passed the check from
RexUtil.isLiteral
- RexCall call = (RexCall) subCondition;
- // We have to verify the types of the cast here, because if the
underlying literal and the cast output type
- // are different, then skipping the cast might change the meaning of
the subcondition.
- if
(call.getType().getSqlTypeName().equals(call.getOperands().get(0).getType().getSqlTypeName()))
{
- // If the types are the same, unwrap the cast and use the
underlying literal.
- literalSubConditions.add((RexLiteral) call.getOperands().get(0));
- } else {
- // If the types are not the same, return Optional.empty()
indicating the condition is not supported.
- return Optional.empty();
- }
- } else {
- // Literals are always OK.
- literalSubConditions.add((RexLiteral) subCondition);
- }
- continue;
- }
-
- RexNode firstOperand;
- RexNode secondOperand;
- SqlKind comparisonKind;
-
- if (subCondition.isA(SqlKind.INPUT_REF)) {
- firstOperand = rexBuilder.makeLiteral(true);
- secondOperand = subCondition;
- comparisonKind = SqlKind.EQUALS;
-
- if
(!SqlTypeName.BOOLEAN_TYPES.contains(secondOperand.getType().getSqlTypeName()))
{
- plannerContext.setPlanningError(
- "SQL requires a join with '%s' condition where the column is of
the type %s, that is not supported",
- subCondition.getKind(),
- secondOperand.getType().getSqlTypeName()
- );
- return Optional.empty();
-
- }
- } else if (subCondition.isA(SqlKind.EQUALS) ||
subCondition.isA(SqlKind.IS_NOT_DISTINCT_FROM)) {
- final List<RexNode> operands = ((RexCall) subCondition).getOperands();
- Preconditions.checkState(operands.size() == 2, "Expected 2 operands,
got[%s]", operands.size());
- firstOperand = operands.get(0);
- secondOperand = operands.get(1);
- comparisonKind = subCondition.getKind();
- } else {
- // If it's not EQUALS or a BOOLEAN input ref, it's not supported.
- plannerContext.setPlanningError(
- "SQL requires a join with '%s' condition that is not supported.",
- subCondition.getKind()
- );
- return Optional.empty();
- }
-
- if (isLeftExpression(firstOperand, numLeftFields) &&
isRightInputRef(secondOperand, numLeftFields)) {
- equalitySubConditions.add(new RexEquality(firstOperand, (RexInputRef)
secondOperand, comparisonKind));
- rightColumns.add((RexInputRef) secondOperand);
- } else if (isRightInputRef(firstOperand, numLeftFields)
- && isLeftExpression(secondOperand, numLeftFields)) {
- equalitySubConditions.add(new RexEquality(secondOperand, (RexInputRef)
firstOperand, subCondition.getKind()));
- rightColumns.add((RexInputRef) firstOperand);
- } else {
- // Cannot handle this condition.
- plannerContext.setPlanningError("SQL is resulting in a join that has
unsupported operand types.");
- return Optional.empty();
- }
- }
-
+ ConditionAnalysis conditionAnalysis = analyzeCondition(condition,
leftRowType, rexBuilder);
// if the right side requires a subquery, then even lookup will be
transformed to a QueryDataSource
// thereby allowing join conditions on both k and v columns of the lookup
if (right != null
&& !DruidJoinQueryRel.computeRightRequiresSubquery(plannerContext,
DruidJoinQueryRel.getSomeDruidChild(right))
&& right instanceof DruidQueryRel) {
DruidQueryRel druidQueryRel = (DruidQueryRel) right;
if (druidQueryRel.getDruidTable().getDataSource() instanceof
LookupDataSource) {
- long distinctRightColumns =
rightColumns.stream().map(RexSlot::getIndex).distinct().count();
+ long distinctRightColumns =
conditionAnalysis.rightColumns.stream().map(RexSlot::getIndex).distinct().count();
if (distinctRightColumns > 1) {
// it means that the join's right side is lookup and the join
condition contains both key and value columns of lookup.
// currently, the lookup datasource in the native engine doesn't
support using value column in the join condition.
plannerContext.setPlanningError(
"SQL is resulting in a join involving lookup where value column
is used in the condition.");
- return Optional.empty();
+ return false;
}
}
}
- return Optional.of(
- new ConditionAnalysis(
- numLeftFields,
- equalitySubConditions,
- literalSubConditions
- )
- );
- }
+ if (joinType != JoinRelType.INNER || !systemFieldList.isEmpty() ||
NullHandling.replaceWithDefault()) {
+ // I am not sure in what case, the list of system fields will be not
empty. I have just picked up this logic
+ // directly from
https://github.com/apache/calcite/blob/calcite-1.35.0/core/src/main/java/org/apache/calcite/rel/rules/AbstractJoinExtractFilterRule.java#L58
Review Comment:
what is systemFieldList? i dug around a bit and isnt super obvious to me 🙃
```
* @param systemFieldList List of system fields that will be prefixed to
* output row type; typically empty but must not be
* null
```
or `Returns a list of system fields that will be prefixed to output row
type.` are all that i can find but i don't quite fully understand what it means
##########
docs/querying/datasource.md:
##########
@@ -335,21 +339,23 @@ SQL joins take the form:
<o1> [ INNER | LEFT [OUTER] ] JOIN <o2> ON <condition>
```
-The condition must involve only equalities, but functions are okay, and there
can be multiple equalities ANDed together.
-Conditions like `t1.x = t2.x`, or `LOWER(t1.x) = t2.x`, or `t1.x = t2.x AND
t1.y = t2.y` can all be handled. Conditions
-like `t1.x <> t2.x` cannot currently be handled.
+Any condition is accepted, but only certain kinds of conditions execute as
part of a native join. To execute efficiently
+as part of a native join, a condition must be a single clause like the
following, or an `AND` of clauses involving at
+lease one of the following:
Review Comment:
```suggestion
Any condition is accepted, but only certain kinds of conditions execute
efficiently
as a native join. The condition must be a single clause like the following,
or an `AND` of clauses involving at
least one of the following:
```
though the "or an `AND`" part still reads not super smoothly but i don't
have any better ideas 😅
##########
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidJoinRule.java:
##########
@@ -222,164 +233,53 @@ private static RexNode makeNullableIfLiteral(final
RexNode rexNode, final RexBui
}
/**
- * Returns whether {@link #analyzeCondition} would return something.
+ * Returns whether we can handle the join condition. In case, some
conditions in an AND expression are not supported,
+ * they are extracted into a post-join filter instead.
*/
@VisibleForTesting
- boolean canHandleCondition(final RexNode condition, final RelDataType
leftRowType, DruidRel<?> right, final RexBuilder rexBuilder)
- {
- return analyzeCondition(condition, leftRowType, right,
rexBuilder).isPresent();
- }
-
- /**
- * If this condition is an AND of some combination of (1) literals; (2)
equality conditions of the form
- * {@code f(LeftRel) = RightColumn}, then return a {@link ConditionAnalysis}.
- */
- private Optional<ConditionAnalysis> analyzeCondition(
+ public boolean canHandleCondition(
final RexNode condition,
final RelDataType leftRowType,
- final DruidRel<?> right,
+ DruidRel<?> right,
+ JoinRelType joinType,
+ List<RelDataTypeField> systemFieldList,
final RexBuilder rexBuilder
)
{
- final List<RexNode> subConditions = decomposeAnd(condition);
- final List<RexEquality> equalitySubConditions = new ArrayList<>();
- final List<RexLiteral> literalSubConditions = new ArrayList<>();
- final int numLeftFields = leftRowType.getFieldCount();
- final Set<RexInputRef> rightColumns = new HashSet<>();
-
- for (RexNode subCondition : subConditions) {
- if (RexUtil.isLiteral(subCondition, true)) {
- if (subCondition.isA(SqlKind.CAST)) {
- // This is CAST(literal) which is always OK.
- // We know that this is CAST(literal) as it passed the check from
RexUtil.isLiteral
- RexCall call = (RexCall) subCondition;
- // We have to verify the types of the cast here, because if the
underlying literal and the cast output type
- // are different, then skipping the cast might change the meaning of
the subcondition.
- if
(call.getType().getSqlTypeName().equals(call.getOperands().get(0).getType().getSqlTypeName()))
{
- // If the types are the same, unwrap the cast and use the
underlying literal.
- literalSubConditions.add((RexLiteral) call.getOperands().get(0));
- } else {
- // If the types are not the same, return Optional.empty()
indicating the condition is not supported.
- return Optional.empty();
- }
- } else {
- // Literals are always OK.
- literalSubConditions.add((RexLiteral) subCondition);
- }
- continue;
- }
-
- RexNode firstOperand;
- RexNode secondOperand;
- SqlKind comparisonKind;
-
- if (subCondition.isA(SqlKind.INPUT_REF)) {
- firstOperand = rexBuilder.makeLiteral(true);
- secondOperand = subCondition;
- comparisonKind = SqlKind.EQUALS;
-
- if
(!SqlTypeName.BOOLEAN_TYPES.contains(secondOperand.getType().getSqlTypeName()))
{
- plannerContext.setPlanningError(
- "SQL requires a join with '%s' condition where the column is of
the type %s, that is not supported",
- subCondition.getKind(),
- secondOperand.getType().getSqlTypeName()
- );
- return Optional.empty();
-
- }
- } else if (subCondition.isA(SqlKind.EQUALS) ||
subCondition.isA(SqlKind.IS_NOT_DISTINCT_FROM)) {
- final List<RexNode> operands = ((RexCall) subCondition).getOperands();
- Preconditions.checkState(operands.size() == 2, "Expected 2 operands,
got[%s]", operands.size());
- firstOperand = operands.get(0);
- secondOperand = operands.get(1);
- comparisonKind = subCondition.getKind();
- } else {
- // If it's not EQUALS or a BOOLEAN input ref, it's not supported.
- plannerContext.setPlanningError(
- "SQL requires a join with '%s' condition that is not supported.",
- subCondition.getKind()
- );
- return Optional.empty();
- }
-
- if (isLeftExpression(firstOperand, numLeftFields) &&
isRightInputRef(secondOperand, numLeftFields)) {
- equalitySubConditions.add(new RexEquality(firstOperand, (RexInputRef)
secondOperand, comparisonKind));
- rightColumns.add((RexInputRef) secondOperand);
- } else if (isRightInputRef(firstOperand, numLeftFields)
- && isLeftExpression(secondOperand, numLeftFields)) {
- equalitySubConditions.add(new RexEquality(secondOperand, (RexInputRef)
firstOperand, subCondition.getKind()));
- rightColumns.add((RexInputRef) firstOperand);
- } else {
- // Cannot handle this condition.
- plannerContext.setPlanningError("SQL is resulting in a join that has
unsupported operand types.");
- return Optional.empty();
- }
- }
-
+ ConditionAnalysis conditionAnalysis = analyzeCondition(condition,
leftRowType, rexBuilder);
// if the right side requires a subquery, then even lookup will be
transformed to a QueryDataSource
// thereby allowing join conditions on both k and v columns of the lookup
if (right != null
&& !DruidJoinQueryRel.computeRightRequiresSubquery(plannerContext,
DruidJoinQueryRel.getSomeDruidChild(right))
&& right instanceof DruidQueryRel) {
DruidQueryRel druidQueryRel = (DruidQueryRel) right;
if (druidQueryRel.getDruidTable().getDataSource() instanceof
LookupDataSource) {
- long distinctRightColumns =
rightColumns.stream().map(RexSlot::getIndex).distinct().count();
+ long distinctRightColumns =
conditionAnalysis.rightColumns.stream().map(RexSlot::getIndex).distinct().count();
if (distinctRightColumns > 1) {
// it means that the join's right side is lookup and the join
condition contains both key and value columns of lookup.
// currently, the lookup datasource in the native engine doesn't
support using value column in the join condition.
plannerContext.setPlanningError(
"SQL is resulting in a join involving lookup where value column
is used in the condition.");
- return Optional.empty();
+ return false;
}
}
}
- return Optional.of(
- new ConditionAnalysis(
- numLeftFields,
- equalitySubConditions,
- literalSubConditions
- )
- );
- }
+ if (joinType != JoinRelType.INNER || !systemFieldList.isEmpty() ||
NullHandling.replaceWithDefault()) {
+ // I am not sure in what case, the list of system fields will be not
empty. I have just picked up this logic
+ // directly from
https://github.com/apache/calcite/blob/calcite-1.35.0/core/src/main/java/org/apache/calcite/rel/rules/AbstractJoinExtractFilterRule.java#L58
Review Comment:
what is systemFieldList? i dug around a bit and isnt super obvious to me 🙃
```
* @param systemFieldList List of system fields that will be prefixed to
* output row type; typically empty but must not be
* null
```
or `Returns a list of system fields that will be prefixed to output row
type.` are all that i can find but i don't quite fully understand what it means
##########
docs/querying/datasource.md:
##########
@@ -335,21 +339,23 @@ SQL joins take the form:
<o1> [ INNER | LEFT [OUTER] ] JOIN <o2> ON <condition>
```
-The condition must involve only equalities, but functions are okay, and there
can be multiple equalities ANDed together.
-Conditions like `t1.x = t2.x`, or `LOWER(t1.x) = t2.x`, or `t1.x = t2.x AND
t1.y = t2.y` can all be handled. Conditions
-like `t1.x <> t2.x` cannot currently be handled.
+Any condition is accepted, but only certain kinds of conditions execute as
part of a native join. To execute efficiently
+as part of a native join, a condition must be a single clause like the
following, or an `AND` of clauses involving at
+lease one of the following:
Review Comment:
```suggestion
Any condition is accepted, but only certain kinds of conditions execute
efficiently
as a native join. The condition must be a single clause like the following,
or an `AND` of clauses involving at
least one of the following:
```
though the "or an `AND`" part still reads not super smoothly but i don't
have any better ideas 😅
##########
sql/src/test/java/org/apache/druid/sql/calcite/rule/DruidJoinRuleTest.java:
##########
@@ -85,7 +89,7 @@ public void test_canHandleCondition_leftEqRight()
),
leftType,
null,
- rexBuilder
+ JoinRelType.INNER, ImmutableList.of(), rexBuilder
Review Comment:
nit: args should be on newlines, same with a handful of other parts in this
file (one of few things style autobots can't detect very well)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]