mihaibudiu commented on code in PR #4814:
URL: https://github.com/apache/calcite/pull/4814#discussion_r2881046133
##########
core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java:
##########
@@ -787,6 +788,60 @@ private static RexNode rewriteIn(RexSubQuery e,
Set<CorrelationId> variablesSet,
// Now the left join
builder.join(JoinRelType.LEFT, builder.and(conditions), variablesSet);
+ // for single-column IN (e.g. deptno IN (select deptno ...)), the CASE
+ // expression above includes:
+ // when ct.ck < ct.c then null -- some RHS value is NULL
+ // this is correct: if no direct match was found (dt.i IS NULL) and any
+ // RHS value is NULL, the answer for every unmatched LHS key is UNKNOWN,
+ // because we cannot rule out that the LHS key equals the NULL.
Review Comment:
equals the NULL -> is NULL.
##########
core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java:
##########
@@ -787,6 +788,60 @@ private static RexNode rewriteIn(RexSubQuery e,
Set<CorrelationId> variablesSet,
// Now the left join
builder.join(JoinRelType.LEFT, builder.and(conditions), variablesSet);
+ // for single-column IN (e.g. deptno IN (select deptno ...)), the CASE
Review Comment:
I think this would be easier to read if you showed first an example of the
LEFT JOIN that is being produced and then followed with these comments.
##########
core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java:
##########
@@ -787,6 +788,60 @@ private static RexNode rewriteIn(RexSubQuery e,
Set<CorrelationId> variablesSet,
// Now the left join
builder.join(JoinRelType.LEFT, builder.and(conditions), variablesSet);
+ // for single-column IN (e.g. deptno IN (select deptno ...)), the CASE
+ // expression above includes:
+ // when ct.ck < ct.c then null -- some RHS value is NULL
+ // this is correct: if no direct match was found (dt.i IS NULL) and any
+ // RHS value is NULL, the answer for every unmatched LHS key is UNKNOWN,
+ // because we cannot rule out that the LHS key equals the NULL.
+ //
+ // for multi-column IN, the global ck < c check is insufficient. Consider:
+ // (empno, deptno) IN (select empno, deptno from ...)
+ // where one RHS row is (7521, null). For each LHS row, the dt join
condition
+ // "empno = rhs.empno AND deptno = rhs.deptno" becomes "empno = 7521 AND
deptno = null":
+ // for LHS empno=7521: (7521 = 7521) AND (7521 = null) = TRUE AND
UNKNOWN = UNKNOWN
Review Comment:
This is still confusing because `=` is used in two ways, maybe you can
replace some `=` to "which reduces to"
##########
core/src/main/java/org/apache/calcite/rel/rules/SubQueryRemoveRule.java:
##########
@@ -826,16 +881,20 @@ private static RexNode rewriteIn(RexSubQuery e,
Set<CorrelationId> variablesSet,
operands.add(builder.isNotNull(builder.field(dtAlias, "cs")),
trueLiteral);
} else {
- operands.add(builder.isNotNull(last(builder.fields())),
+ // Use alias-based field reference so that adding nr JOIN doesn't shift
index
+ operands.add(builder.isNotNull(builder.field(dtAlias, "i")),
trueLiteral);
}
if (!allLiterals) {
switch (logic) {
case TRUE_FALSE_UNKNOWN:
case UNKNOWN_AS_TRUE:
- // only reference ctAlias if we created it
- if (needsNullSafety) {
+ if (needsNullRowJoin) {
+ // nr.i IS NOT NULL: LHS key matched a null-row in RHS → UNKNOWN.
+ operands.add(builder.isNotNull(builder.field(nrAlias, "i")), b);
+ } else if (needsNullSafety) {
Review Comment:
is the original comment valuable? If so, maybe you can move it here
Also, after this PR is merged, the word "traditional" will make no sense
anymore - people who will read the code won't know that it was here before the
other code was added. So I would just remove it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]