[ 
https://issues.apache.org/jira/browse/HIVE-27102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843243#comment-17843243
 ] 

Stamatis Zampetakis commented on HIVE-27102:
--------------------------------------------

One major element that was introduced in Calcite 1.26.0 and affects heavily the 
upgrade in Hive is the internal SEARCH operator (CALCITE-4173). As discussed 
under the respective ticket this operator aims to represent and unify various 
kinds of expressions notably IN, BETWEEN, and conjunctions/disjunctions with 
range/equality predicates.

Since Hive does not know about the SEARCH operator one idea was to try to 
eliminate it from certain optimization phases in Hive by relying on the 
{{RexUtil#expandSearch}} and the {{HiveSearchExpandRule}} (which was introduced 
in https://github.com/zabetak/hive/tree/calcite-upgrade-1.33). However, some 
core APIS in Calcite such as the RelBuilder, RexSimplify, etc., now return the 
internal SEARCH operator and thus affect many rules, metadata providers, and 
other APIs. It might be difficult to ensure that SEARCH operator is completely 
eliminated from the plans and doing this may result in brittle code.

Going forward Calcite will rely more and more on the SEARCH operator so instead 
of trying to get rid of it we should instead embrace it and ensure that we are 
handling it properly in Hive. In fact we should try to use the SEARCH operator 
as much as possible during the optimization phase and avoid back and forth 
conversions from SEARCH, BETWEEN, IN, etc. Failure to do so will probably make 
future upgrades harder and harder and it will increase the likelihood of 
"infinite rule matching" compilation failures .

In Hive there are two kinds of rules that are strongly related to the SEARCH 
operator:
* HivePointLookupOptimizerRule (useful for normalization and runtime 
performance)
* HiveInBetweenExpandRule (useful for normalization and view based rewritting)

With the advent of the SEARCH operator this rules are heavily impacted. Parts 
of the rules are probably redundant since RexSimplify should be able to handle 
some if not all of their use cases. Ideally, these rules should be removed 
altogether.

Since the physical evaluation of the IN operator seems to have some benefits 
over the evaluation of OR (HIVE-11424) we should have some logic at the end of 
the optimization phase that will decide how to translate the SEARCH operator to 
a physical OR vs IN operator.

Apart from the aforementioned rules there are probably other places that we 
have to consider the SEARCH operator and respectively get rid of the IN 
operator but it seems that this is the best way forward.

Note that the presence of the SEARCH operator in the EXPLAIN CBO plan is not 
something to be avoided if the resulting plans are equivalent or better. We 
should focus only on cases where we spot regressions and decide how to tackle 
them. The plan changes that need to be more carefully reviewed are those at the 
physical layer in order to ensure that we don't have performance regressions.

> Upgrade Calcite to 1.33.0 and Avatica to 1.23.0
> -----------------------------------------------
>
>                 Key: HIVE-27102
>                 URL: https://issues.apache.org/jira/browse/HIVE-27102
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> New versions for Calcite and Avatica are available so we should upgrade to 
> them.
> I had some WIP in HIVE-26610 for upgrading calcite to 1.32.0 but given that 
> the work was not in very advanced state it is preferred to jump directly to 
> 1.33.0.
> Avatica must be inline with Calcite so both need to be updated at the same 
> time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to