[ 
https://issues.apache.org/jira/browse/HIVE-25953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716305#comment-17716305
 ] 

Stamatis Zampetakis commented on HIVE-25953:
--------------------------------------------

The differences between 
[HiveRelMdPredicates|https://github.com/apache/hive/blob/ac48a8b080648096b545034882003ff7847d60b8/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java]
 and 
[RelMdPredicates|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPredicates.java]
 (on Calcite 1.25.0) as far as it concerns the {{RelOptPredicateList 
getPredicates(Join join, RelMetadataQuery mq)}} and directly related data 
structures used by this method are outlined below.

+Hive only+
 * Possibility to infer predicates from ANTI joins (HIVE-23716)
 * Support pull-up of predicates without input references (HIVE-13803)

+Calcite only+
 * Using object equality instead of string equality when comparing RexNode 
expressions (CALCITE-2632)
 * Explicit simplification on predicates pulled from joins inputs 
(CALCITE-2205/CALCITE-2604)

In order to safely drop the in-house implementation of {{RelOptPredicateList 
getPredicates(Join join, RelMetadataQuery mq)}} and use the one from Calcite we 
should port the Hive specific changes to Calcite (assuming that the Calcite 
only changes are always beneficial).

The possibility to infer predicates from ANTI joins is an improvement that will 
land soon in Calcite (CALCITE-5675).

The pull-up of predicates without input references is debatable and probably 
should be dropped from Hive rather than landing in Calcite. The feature was 
introduced explicitly by HIVE-13803 in an attempt to pull "false" predicates 
(which essentially do not reference any input) from one side of the join and 
propagate them into the other side of the join. However, after HIVE-26524 the 
pruning rules are able to remove completely entire joins so the false 
predicates do not ever appear in the plan.

Someone can argue that we can still have predicates that do not reference any 
inputs (such as UNIX_TIMESTAMP() > 1681909077836) but the benefits of moving 
them around in the plan is less obvious. Consider the following SQL query and 
the respective plan.
{code:sql}
EXPLAIN CBO SELECT * 
FROM (SELECT ename, did FROM emp WHERE UNIX_TIMESTAMP() > 1681909077836) e
INNER JOIN dept d ON d.did = e.did;
{code}
{noformat}
HiveJoin(condition=[=($2, $1)], joinType=[inner], algorithm=[none], cost=[not 
available])
  HiveProject(ename=[$1], did=[$2])
    HiveFilter(condition=[AND(>(UNIX_TIMESTAMP(), 1681909077836), IS NOT 
NULL($2))])
      HiveTableScan(table=[[default, emp]], table:alias=[emp])
  HiveProject(did=[$0], dname=[$1])
    HiveFilter(condition=[AND(>(UNIX_TIMESTAMP(), 1681909077836), IS NOT 
NULL($0))])
      HiveTableScan(table=[[default, dept]], table:alias=[d])
{noformat}
Observe that due to special Hive logic of pulling predicates we can pull 
{{>(UNIX_TIMESTAMP(), 1681909077836)}} from the left side of the join and push 
it to the right. Note that this pull/push logic is only valid for deterministic 
predicates 
([https://github.com/apache/calcite/blob/e7375ae745ec18ce9df68b4945bb521ae49a053c/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L1048]).
 If the predicate is not deterministic the it is not valid to transfer the 
predicate above a filter 
([https://github.com/apache/calcite/blob/e7375ae745ec18ce9df68b4945bb521ae49a053c/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPredicates.java#L305]);
 consider for example {{{}RAND(){}}}.
      
Based on the definition of a deterministic operator, the same input always 
gives the same output. This means that a function that does not reference any 
inputs and at the same time is deterministic, can be evaluated statically at 
compile time; the result will be either true or false and will be further 
simplified and dissappear from the plan.

In the end, the special logic for handling predicates without input references 
is useless when certain classic rules are present. The reduction and pruning 
rules are present in Hive so the pull/push logic is redundant. One caveat is 
the UNIX_TIMESTAMP() > 1681909077836 example shown above that even though it is 
deterministic it is not reduced to a constant due to HIVE-27291; other similar 
UDFs such as CURRENT_TIMESTAMP, CURRENT_DATE, etc., are reduced as expected. 
HIVE-27291 is an edge-case and should be fixed in the near future but it is not 
really blocking for this ticket.

> Drop HiveRelMdPredicates::getPredicates(Join...) to use that of 
> RelMdPredicates
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-25953
>                 URL: https://issues.apache.org/jira/browse/HIVE-25953
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>    Affects Versions: 4.0.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The goal of the ticket is to unify the two implementations and remove the 
> override in HiveRelMdPredicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to