Re: [PR] [CALCITE-7174] Improve lossless cast detection for numeric types [calcite]

via GitHub Sat, 27 Sep 2025 08:13:55 -0700


xiedeyantu commented on code in PR #4557:
URL: https://github.com/apache/calcite/pull/4557#discussion_r2384206942



##########
core/src/test/resources/sql/planner.iq:
##########
@@ -223,15 +224,16 @@ select a from (values (1.0), (4.0), (null)) as t3 (a);
 
 !ok
 
-EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], A=[$t1])
-  EnumerableNestedLoopJoin(condition=[OR(AND(IS NULL(CAST($0):DECIMAL(11, 1)), 
IS NULL(CAST($1):DECIMAL(11, 1))), =(CAST($0):DECIMAL(11, 1), 
CAST($1):DECIMAL(11, 1)))], joinType=[anti])
-    EnumerableAggregate(group=[{0}])
-      EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT NULL, 
CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
-        EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT 
NULL], A=[$t1])
-          EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 5.0 
}]])
-        EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) NOT 
NULL], A=[$t1])
-          EnumerableValues(tuples=[[{ 1 }, { 2 }]])
-    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1])
+EnumerableCalc(expr#0..1=[{inputs}], expr#2=[CAST($t0):DECIMAL(11, 1)], 
A=[$t2])
+  EnumerableHashJoin(condition=[=($1, $3)], joinType=[anti])
+    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
proj#0..1=[{exprs}])
+      EnumerableAggregate(group=[{0}])
+        EnumerableNestedLoopJoin(condition=[=(CAST($0):DECIMAL(11, 1) NOT 
NULL, CAST($1):DECIMAL(11, 1) NOT NULL)], joinType=[anti])
+          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1])
+            EnumerableValues(tuples=[[{ 1.0 }, { 2.0 }, { 3.0 }, { 4.0 }, { 
5.0 }]])
+          EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1) 
NOT NULL], A=[$t1])
+            EnumerableValues(tuples=[[{ 1 }, { 2 }]])
+    EnumerableCalc(expr#0=[{inputs}], expr#1=[CAST($t0):DECIMAL(11, 1)], 
A=[$t1], A0=[$t1])

Review Comment:
   @asolimando Thank you very much for the explanation. After carefully 
debugging, I found that the issue reflected by the `IntersectToSemiJoinRule` is 
only superficial. The root cause is that you implemented lossless casting for 
the `Decimal` type, which enhances expression simplification capabilities. For 
example, in the case of `AND(IS NULL(CAST($0):DECIMAL(11, 1)), IS 
NULL(CAST($1):DECIMAL(11, 1)))`, it can be simplified to `false`. This enables 
the `JoinPushExpressionsRule` to be successfully applied.
   The issue is not actually about multiple casts in the 
`IntersectToSemiJoinRule`. I believe that even if multiple repeated `cast`s are 
performed, they should be successfully eliminated. Of course, I agree with your 
point about optimizing the problem of generating `cast`s at every layer in the 
`IntersectToSemiJoinRule`—this should be addressed at another level. Generating 
`cast`s at every layer indeed complicates the plan unnecessarily.
   Finally, I want to say that your PR is truly very useful. If there are no 
further comments, I'm looking forward to this PR being merged as soon as 
possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-7174] Improve lossless cast detection for numeric types [calcite]

Reply via email to