[
https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843243#comment-17843243
]
Stamatis Zampetakis commented on HIVE-27102:
--------------------------------------------
One major element that was introduced in Calcite 1.26.0 and affects heavily the
upgrade in Hive is the internal SEARCH operator (CALCITE-4173). As discussed
under the respective ticket this operator aims to represent and unify various
kinds of expressions notably IN, BETWEEN, and conjunctions/disjunctions with
range/equality predicates.
Since Hive does not know about the SEARCH operator one idea was to try to
eliminate it from certain optimization phases in Hive by relying on the
{{RexUtil#expandSearch}} and the {{HiveSearchExpandRule}} (which was introduced
in https://github.com/zabetak/hive/tree/calcite-upgrade-1.33). However, some
core APIS in Calcite such as the RelBuilder, RexSimplify, etc., now return the
internal SEARCH operator and thus affect many rules, metadata providers, and
other APIs. It might be difficult to ensure that SEARCH operator is completely
eliminated from the plans and doing this may result in brittle code.
Going forward Calcite will rely more and more on the SEARCH operator so instead
of trying to get rid of it we should instead embrace it and ensure that we are
handling it properly in Hive. In fact we should try to use the SEARCH operator
as much as possible during the optimization phase and avoid back and forth
conversions from SEARCH, BETWEEN, IN, etc. Failure to do so will probably make
future upgrades harder and harder and it will increase the likelihood of
"infinite rule matching" compilation failures .
In Hive there are two kinds of rules that are strongly related to the SEARCH
operator:
* HivePointLookupOptimizerRule (useful for normalization and runtime
performance)
* HiveInBetweenExpandRule (useful for normalization and view based rewritting)
With the advent of the SEARCH operator this rules are heavily impacted. Parts
of the rules are probably redundant since RexSimplify should be able to handle
some if not all of their use cases. Ideally, these rules should be removed
altogether.
Since the physical evaluation of the IN operator seems to have some benefits
over the evaluation of OR (HIVE-11424) we should have some logic at the end of
the optimization phase that will decide how to translate the SEARCH operator to
a physical OR vs IN operator.
Apart from the aforementioned rules there are probably other places that we
have to consider the SEARCH operator and respectively get rid of the IN
operator but it seems that this is the best way forward.
Note that the presence of the SEARCH operator in the EXPLAIN CBO plan is not
something to be avoided if the resulting plans are equivalent or better. We
should focus only on cases where we spot regressions and decide how to tackle
them. The plan changes that need to be more carefully reviewed are those at the
physical layer in order to ensure that we don't have performance regressions.
> Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
> -----------------------------------------------
>
> Key: HIVE-27102
> URL: https://issues.apache.org/jira/browse/HIVE-27102
> Project: Hive
> Issue Type: Improvement
> Components: CBO
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
>
> New versions for Calcite and Avatica are available so we should upgrade to
> them.
> I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that
> the work was not in very advanced state it is preferred to jump directly to
> 1.33.0.
> Avatica must be inline with Calcite so both need to be updated at the same
> time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)