iwanttobepowerful commented on code in PR #4375:
URL: https://github.com/apache/calcite/pull/4375#discussion_r2139265466
##########
core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java:
##########
@@ -760,6 +766,136 @@ protected RexNode removeCorrelationExpr(
RelNode newRel = relBuilder.build();
+ for (AggregateCall aggCall : rel.getAggCallList()) {
+ if (aggCall.getAggregation() instanceof SqlCountAggFunction) {
+ parentPropagatesNullValues = false;
+ break;
+ }
+ }
+
+ // Special case where the group by is static (i.e., aggregation functions
without group by).
+ //
+ // When unnesting an Aggregate, we add corVar as an extra groupKey to
rewrite Correlate as JOIN.
+ // For the query:
+ // SELECT SUM(salary), COUNT(name) FROM A;
+ // When table A is empty, it returns [null, 0].
+ // But for
+ // SELECT SUM(salary), COUNT(name) FROM A group by id
+ // When table A is empty, it returns empty. This causes result mismatch.
+ //
+ // We refer to this situation as: `The well-known count bug`,
+ // More details about this issue: Optimization of Nested SQL Queries
Revisited
+ // (https://dl.acm.org/doi/pdf/10.1145/38714.38723)
+ //
+ // To handle this situation, we ensure aggregated result output through
pre-join
+ // Given the following plan:
+ // LogicalCorrelate(correlation=[$cor0], joinType=[inner],
requiredColumns=[{0}])
+ // LogicalProject(DEPTNO=[$0])
+ // LogicalTableScan(table=[[scott, DEPT]])
+ // LogicalProject(EXPR$1=[IS NULL($1)])
+ // LogicalFilter(condition=[=(0, $0)])
+ // LogicalAggregate(group=[{}], EXPR$0=[COUNT()],
EXPR$1=[SUM($0)])
+ // LogicalFilter(condition=[=($cor0.DEPTNO, $7)])
+ // LogicalTableScan(table=[[scott, EMP]])
Review Comment:
plz add sql stmt for ref
##########
core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java:
##########
@@ -760,6 +766,136 @@ protected RexNode removeCorrelationExpr(
RelNode newRel = relBuilder.build();
+ for (AggregateCall aggCall : rel.getAggCallList()) {
+ if (aggCall.getAggregation() instanceof SqlCountAggFunction) {
+ parentPropagatesNullValues = false;
+ break;
+ }
+ }
+
+ // Special case where the group by is static (i.e., aggregation functions
without group by).
+ //
+ // When unnesting an Aggregate, we add corVar as an extra groupKey to
rewrite Correlate as JOIN.
+ // For the query:
+ // SELECT SUM(salary), COUNT(name) FROM A;
+ // When table A is empty, it returns [null, 0].
+ // But for
+ // SELECT SUM(salary), COUNT(name) FROM A group by id
+ // When table A is empty, it returns empty. This causes result mismatch.
+ //
+ // We refer to this situation as: `The well-known count bug`,
+ // More details about this issue: Optimization of Nested SQL Queries
Revisited
+ // (https://dl.acm.org/doi/pdf/10.1145/38714.38723)
+ //
+ // To handle this situation, we ensure aggregated result output through
pre-join
+ // Given the following plan:
+ // LogicalCorrelate(correlation=[$cor0], joinType=[inner],
requiredColumns=[{0}])
+ // LogicalProject(DEPTNO=[$0])
+ // LogicalTableScan(table=[[scott, DEPT]])
+ // LogicalProject(EXPR$1=[IS NULL($1)])
+ // LogicalFilter(condition=[=(0, $0)])
+ // LogicalAggregate(group=[{}], EXPR$0=[COUNT()],
EXPR$1=[SUM($0)])
+ // LogicalFilter(condition=[=($cor0.DEPTNO, $7)])
+ // LogicalTableScan(table=[[scott, EMP]])
+ //
+ // The regular rewrite as:
+ // LogicalJoin(condition=[=($0, $2)], joinType=[inner])
+ // LogicalProject(DEPTNO=[$0])
+ // LogicalTableScan(table=[[scott, DEPT]])
+ // LogicalProject(EXPR$1=[IS NULL($2)], DEPTNO=[$0])
+ // LogicalFilter(condition=[=(0, $1)])
+ // LogicalAggregate(group=[{0}], EXPR$0=[COUNT()],
EXPR$1=[SUM($0)])
+ // LogicalProject(DEPTNO=[$7])
+ // LogicalFilter(condition=[IS NOT NULL($7)])
+ // LogicalTableScan(table=[[scott, EMP]])
+ // It will causes rows with `count=0` to be filtered out, and IS NULL($2)
will return null
+ // instead of true. Therefore, we use LEFT JOIN to ensure that
+ // correlation fields (extra group by key) always returns the aggregation
result.
+ //
+ // Rewrite Aggregate as:
+ // LogicalProject(DEPTNO=[$0], EXPR$0=[CASE(IS NOT NULL($2), $2, 0),
EXPR$1=[$(1)])
Review Comment:
maybe parenthesis '[' not matching
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]