Re: [PR] [CALCITE-6451] Refine nullability of outputs for MINUS and INTERSECT [calcite]

via GitHub Fri, 05 Jul 2024 17:02:42 -0700


vbarua commented on PR #3845:
URL: https://github.com/apache/calcite/pull/3845#issuecomment-2211519882


   This interacts interestingly/poorly with the 
[IntersectToDistinctRule](https://github.com/apache/calcite/blob/8a96095a64bc1cc955438d219d5f1fbcbc5762b7/core/src/main/java/org/apache/calcite/rel/rules/IntersectToDistinctRule.java),
 which rewrites Intersects to Unions using something like:
   
   ```
   LogicalIntersect(all=[false])
     LogicalProject(a=[$0], b=[$1], c=[$2])
       LogicalTableScan(table=[[CATALOG, TEMP, FOO]])
     LogicalProject(a=[$0], b=[$1], c=[$2])
       LogicalTableScan(table=[[CATALOG, TEMP, BAR]])
   ```
   which becomes
   ```
   LogicalProject(a=[$0], b=[$1], c=[$2])
     LogicalFilter(condition=[=($9, 2)])
       LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()])
         LogicalUnion(all=[true])
           LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()])
             LogicalProject(a=[$0], b=[$1], c=[$2])
               LogicalTableScan(table=[[CATALOG, TEMP, FOO]])
           LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()])
             LogicalProject(a=[$0], b=[$1], c=[$2])
               LogicalTableScan(table=[[CATALOG, TEMP, BAR]])
   ```
   
   What's interesting about this is that while it returns the same results, 
it's actually somewhat lossy with regards to the nullability information 
because we can't infer as tight nullability bounds for the second form as we 
can the first. That is to say, if `FOO.a` is nullable but `BAR.a` is not 
nullable, then with the first form we can say that in the output `a` will not 
be nullable but this is not the case with the second form.
   
   One way I could see to work around this would be to improve the 
[IntersectToDistinctRule](https://github.com/apache/calcite/blob/8a96095a64bc1cc955438d219d5f1fbcbc5762b7/core/src/main/java/org/apache/calcite/rel/rules/IntersectToDistinctRule.java)
 by including filtering information when possible. That is, when a column is 
not nullable in all Intersect branches, we add an `IS NOT NULL` filter to 
exclude nulls:
   ```
   LogicalProject(a=[$0], b=[$1], c=[$2])
     LogicalFilter(condition=[=($9, 2)])
       LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()])
         LogicalUnion(all=[true])
           LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()])
             LogicalFilter(condition=[IS NOT NULL($0)] <-- Filter here excludes 
NULL columns
               LogicalProject(a=[$0], b=[$1], c=[$2])
                 LogicalTableScan(table=[[CATALOG, TEMP, FOO]])
           LogicalAggregate(group=[{0, 1, 2}], agg#0=[COUNT()])
             LogicalProject(a=[$0], b=[$1], c=[$2])
               LogicalTableScan(table=[[CATALOG, TEMP, BAR]])
   ```
   
   However that might also require additional work because I don't believe that 
Calcite can use the presence of an IS NOT NULL filter to change the output 
nullability.
   
   The reason this is a problem now is that when the IntersectToDisctinctRule 
is applied that types of the RelNode going in and RelNode coming out no longer 
match.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-6451] Refine nullability of outputs for MINUS and INTERSECT [calcite]

Reply via email to