soumyakanti3578 commented on code in PR #3359:
URL: https://github.com/apache/hive/pull/3359#discussion_r956341300
##########
ql/src/test/results/clientpositive/llap/vector_interval_2.q.out:
##########
@@ -1350,16 +1350,16 @@ STAGE PLANS:
Map Operator Tree:
TableScan
alias: vector_interval_2
- filterExpr: (CAST( str1 AS INTERVAL YEAR TO MONTH) is not
null and ((dt + CAST( str1 AS INTERVAL YEAR TO MONTH)) = DATE'2002-03-01') and
((dt + INTERVAL'1-2') = DATE'2002-03-01') and (dt <> (dt + CAST( str1 AS
INTERVAL YEAR TO MONTH))) and (dt <> (dt + INTERVAL'1-2'))) (type: boolean)
+ filterExpr: (((dt + CAST( str1 AS INTERVAL YEAR TO MONTH)) =
DATE'2002-03-01') and ((dt + INTERVAL'1-2') = DATE'2002-03-01') and (dt <> (dt
+ CAST( str1 AS INTERVAL YEAR TO MONTH))) and (dt <> (dt + INTERVAL'1-2')))
(type: boolean)
Review Comment:
I think I figured out why `is not null` is missing here. I tried to
reproduce this with the following test:
> explain vectorization expression
select ts from vector_interval_2
where
timestamp '2001-01-01 01:02:03' <= dt + interval '0 1:2:3' day to second
and dt + interval '0 1:2:3' day to second = timestamp '2001-01-01 01:02:03'
and dt + interval '0 1:2:3' day to second >= timestamp '2001-01-01
01:02:03'
order by ts;
With default value for `CalciteSystemProperty.ENABLE_REX_DIGEST_NORMALIZE`,
we can see
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexNormalize.java#L123)
that `<, <=` are preferred over `>, >=`, i.e., if possible all `>, >=` are
reversed. Now, the `AND` clause contains expressions like `(A<B) and (C<=D) and
(B>A) and (D>=C)`, for example, just above this diff, there's the diff on line
1172 which essentially has:
>str3 <= str4 and str3 < str4 and str4 >= str3 and str4 > str3
and we can see that the second half is pruned as it is exactly the same as
the first half. These duplicate expressions are not getting added in the
`digests`
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L1142).
`digests.add` eventually leads to `RexNormalize.hashcode` method, which
reverses the `>, >=` rexnodes and returns same hashcode as the `<, <=`
expressions and thus they are not added to `digests`.
Now, when `CalciteSystemProperty.ENABLE_REX_DIGEST_NORMALIZE` is set to
`false`, we get all these duplicate expressions which are not pruned, and in
`RexSimplify.simplifyUsingPredicates` method, we calculate `range2`
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1631),
by calling `residue` and passing the list of predicates
`predicates.pulledUpPredicates`. This list contains all the predicates that
have been seen before the current predicate. So when there are duplicates and
we see the same predicate for the second time, `residue` returns `Range.all()`
from
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1695),
and this results in a `is not null`
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1643).
I think all the changes in this commit are fine as all of them are because
of pruning of duplicate predicates. What do you think @zabetak ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]