[
https://issues.apache.org/jira/browse/HIVE-26312?focusedWorklogId=804056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-804056
]
ASF GitHub Bot logged work on HIVE-26312:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Aug/22 18:50
Start Date: 26/Aug/22 18:50
Worklog Time Spent: 10m
Work Description: soumyakanti3578 commented on code in PR #3359:
URL: https://github.com/apache/hive/pull/3359#discussion_r956341300
##########
ql/src/test/results/clientpositive/llap/vector_interval_2.q.out:
##########
@@ -1350,16 +1350,16 @@ STAGE PLANS:
Map Operator Tree:
TableScan
alias: vector_interval_2
- filterExpr: (CAST( str1 AS INTERVAL YEAR TO MONTH) is not
null and ((dt + CAST( str1 AS INTERVAL YEAR TO MONTH)) = DATE'2002-03-01') and
((dt + INTERVAL'1-2') = DATE'2002-03-01') and (dt <> (dt + CAST( str1 AS
INTERVAL YEAR TO MONTH))) and (dt <> (dt + INTERVAL'1-2'))) (type: boolean)
+ filterExpr: (((dt + CAST( str1 AS INTERVAL YEAR TO MONTH)) =
DATE'2002-03-01') and ((dt + INTERVAL'1-2') = DATE'2002-03-01') and (dt <> (dt
+ CAST( str1 AS INTERVAL YEAR TO MONTH))) and (dt <> (dt + INTERVAL'1-2')))
(type: boolean)
Review Comment:
I think I figured out why `is not null` is missing here. I tried to
reproduce this with the following test:
> explain vectorization expression
select ts from vector_interval_2
where
timestamp '2001-01-01 01:02:03' <= dt + interval '0 1:2:3' day to second
and dt + interval '0 1:2:3' day to second = timestamp '2001-01-01 01:02:03'
and dt + interval '0 1:2:3' day to second >= timestamp '2001-01-01
01:02:03'
order by ts;
With default value for `CalciteSystemProperty.ENABLE_REX_DIGEST_NORMALIZE`,
we can see
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexNormalize.java#L123)
that `<, <=` are preferred over `>, >=`, i.e., if possible all `>, >=` are
reversed. Now, the `AND` clause contains expressions like `(A<B) and (C<=D) and
(B>A) and (D>=C)`, for example, just above this diff, there's the diff on line
1172 which essentially has:
>str3 <= str4 and str3 < str4 and str4 >= str3 and str4 > str3
and we can see that the second half is pruned as it is exactly the same as
the first half. These duplicate expressions are not getting added in the
`digests`
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L1142).
`digests.add` eventually leads to `RexNormalize.hashcode` method, which
reverses the `>, >=` rexnodes and returns same hashcode as the `<, <=`
expressions and thus they are not added to `digests`.
Now, when `CalciteSystemProperty.ENABLE_REX_DIGEST_NORMALIZE` is set to
`false`, we get all these duplicate expressions which are not pruned, and in
`RexSimplify.simplifyUsingPredicates` method, we calculate `range2`
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1631),
by calling `residue` and passing the list of predicates
`predicates.pulledUpPredicates`. This list contains all the predicates that
have been seen before the current predicate. So when there are duplicates and
we see the same predicate for the second time, `residue` returns `Range.all()`
from
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1695),
and this results in a `is not null`
[here](https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L1643).
I think all the changes in this commit are fine as all of them are because
of pruning of duplicate predicates. What do you think @zabetak ?
Issue Time Tracking
-------------------
Worklog Id: (was: 804056)
Time Spent: 40m (was: 0.5h)
> Use default digest normalization strategy in CBO
> ------------------------------------------------
>
> Key: HIVE-26312
> URL: https://issues.apache.org/jira/browse/HIVE-26312
> Project: Hive
> Issue Type: Task
> Components: CBO
> Affects Versions: 4.0.0-alpha-1
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> CALCITE-2450 introduced a way to improve planning time by normalizing some
> query expressions (RexNodes). The behavior can be enabled/disabled via the
> following system property: calcite.enable.rexnode.digest.normalize
> There was an attempt to disable the normalization explicitly in HIVE-23456 to
> avoid rendering HiveFilterSortPredicates rule useless. However, the [way the
> normalization is disabled
> now|https://github.com/apache/hive/blob/f29cb2245c97102975ea0dd73783049eaa0947a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L549],
> dependents on the way classes are loaded. If for some reason
> CalciteSystemProperty is loaded before hitting the respective line in
> Hive.java setting the property will not have any effect.
> After HIVE-26238 the behavior of the rule is not dependent in the value of
> the property so there is nothing holding us back from enabling the
> normalization.
> At the moment there is not strong reason to enable or disable the
> normalization explicitly so it is better to rely on the default value
> provided by Calcite to avoid running with different normalization strategy
> when the class loading order changes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)