vbarua commented on PR #3845: URL: https://github.com/apache/calcite/pull/3845#issuecomment-2211519882
This interacts interestingly/poorly with the [IntersectToDistinctRule](https://github.com/apache/calcite/blob/8a96095a64bc1cc955438d219d5f1fbcbc5762b7/core/src/main/java/org/apache/calcite/rel/rules/IntersectToDistinctRule.java), which rewrites Intersects to Unions using something like: ``` LogicalIntersect(all=[false]) LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalTableScan(table=[[CATALOG, TEMP, FOO]]) LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalTableScan(table=[[CATALOG, TEMP, BAR]]) ``` which becomes ``` LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalFilter(condition=[=($9, 2)]) LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()]) LogicalUnion(all=[true]) LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()]) LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalTableScan(table=[[CATALOG, TEMP, FOO]]) LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()]) LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalTableScan(table=[[CATALOG, TEMP, BAR]]) ``` What's interesting about this is that while it returns the same results, it's actually somewhat lossy with regards to the nullability information because we can't infer as tight nullability bounds for the second form as we can the first. That is to say, if `FOO.a` is nullable but `BAR.a` is not nullable, then with the first form we can say that in the output `a` will not be nullable but this is not the case with the second form. One way I could see to work around this would be to improve the [IntersectToDistinctRule](https://github.com/apache/calcite/blob/8a96095a64bc1cc955438d219d5f1fbcbc5762b7/core/src/main/java/org/apache/calcite/rel/rules/IntersectToDistinctRule.java) by including filtering information when possible. That is, when a column is not nullable in all Intersect branches, we add an `IS NOT NULL` filter to exclude nulls: ``` LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalFilter(condition=[=($9, 2)]) LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()]) LogicalUnion(all=[true]) LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()]) LogicalFilter(condition=[IS NOT NULL($0)] <-- Filter here excludes NULL columns LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalTableScan(table=[[CATALOG, TEMP, FOO]]) LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()]) LogicalProject(a=[$0], b=[$1], c=[$2]) LogicalTableScan(table=[[CATALOG, TEMP, BAR]]) ``` However that might also require additional work because I don't believe that Calcite can use the presence of an IS NOT NULL filter to change the output nullability. The reason this is a problem now is that when the IntersectToDisctinctRule is applied that types of the RelNode going in and RelNode coming out no longer match. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
