[GitHub] [hive] soumyakanti3578 commented on a diff in pull request #3359: HIVE-26312: Use default digest normalization strategy in CBO

GitBox Fri, 26 Aug 2022 11:50:41 -0700


soumyakanti3578 commented on code in PR #3359:
URL: https://github.com/apache/hive/pull/3359#discussion_r956341300



##########
ql/src/test/results/clientpositive/llap/vector_interval_2.q.out:
##########
@@ -1350,16 +1350,16 @@ STAGE PLANS:
             Map Operator Tree:
                 TableScan
                   alias: vector_interval_2
-                  filterExpr: (CAST( str1 AS INTERVAL YEAR TO MONTH) is not 
null and ((dt + CAST( str1 AS INTERVAL YEAR TO MONTH)) = DATE'2002-03-01') and 
((dt + INTERVAL'1-2') = DATE'2002-03-01') and (dt <> (dt + CAST( str1 AS 
INTERVAL YEAR TO MONTH))) and (dt <> (dt + INTERVAL'1-2'))) (type: boolean)
+                  filterExpr: (((dt + CAST( str1 AS INTERVAL YEAR TO MONTH)) = 
DATE'2002-03-01') and ((dt + INTERVAL'1-2') = DATE'2002-03-01') and (dt <> (dt 
+ CAST( str1 AS INTERVAL YEAR TO MONTH))) and (dt <> (dt + INTERVAL'1-2'))) 
(type: boolean)

Review Comment:
   I think I figured out why `is not null` is missing here. I tried to 
reproduce this with the following test:
   
   > explain vectorization expression
   select ts from vector_interval_2
   where
     timestamp '2001-01-01 01:02:03' <= dt + interval '0 1:2:3' day to second
     and dt + interval '0 1:2:3' day to second = timestamp '2001-01-01 01:02:03'
     and dt + interval '0 1:2:3' day to second >= timestamp '2001-01-01 
01:02:03'
   order by ts;
   
   With default value for `CalciteSystemProperty.ENABLE_REX_DIGEST_NORMALIZE`, 
we can see 
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexNormalize.java#L123)
 that `<, <=` are preferred over `>, >=`, i.e., if possible all `>, >=` are 
reversed. Now, the `AND` clause contains expressions like `(A<B) and (C<=D) and 
(B>A) and (D>=C)`, for example, just above this diff, there's the diff on line 
1172 which essentially has:
   >str3 <= str4 and str3 < str4 and str4 >= str3 and str4 > str3
   
   and we can see that the second half is pruned as it is exactly the same as 
the first half. These duplicate expressions are not getting added in the 
`digests` 
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L1142).
 `digests.add` eventually leads to  `RexNormalize.hashcode` method, which 
reverses the `>, >=` rexnodes and returns same hashcode as the `<, <=` 
expressions and thus they are not added to `digests`. 
   
   Now, when `CalciteSystemProperty.ENABLE_REX_DIGEST_NORMALIZE` is set to 
`false`, we get all these duplicate expressions which are not pruned, and in 
`RexSimplify.simplifyUsingPredicates` method, we calculate `range2` 
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1631),
 by calling `residue` and passing the list of predicates 
`predicates.pulledUpPredicates`. This list contains all the predicates that 
have been seen before the current predicate. So when there are duplicates and 
we see the same predicate for the second time, `residue` returns `Range.all()` 
from 
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1695),
 and this results in a `is not null` 
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1643).
   
   I think all the changes in this commit are fine as all of them are because 
of pruning of duplicate predicates. What do you think @zabetak ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] soumyakanti3578 commented on a diff in pull request #3359: HIVE-26312: Use default digest normalization strategy in CBO

Reply via email to