Stamatis Zampetakis created HIVE-27291:
------------------------------------------

             Summary: Constant reduction in CBO does not work for UNIX_TIMESTAMP
                 Key: HIVE-27291
                 URL: https://issues.apache.org/jira/browse/HIVE-27291
             Project: Hive
          Issue Type: Improvement
          Components: CBO
    Affects Versions: 4.0.0-alpha-2
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


{{UNIX_TIMESTAMP}} function always returns the same output given the same input 
for the duration of the query. In Hive terminology, this function is a 
[runtimeConstant|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/udf/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java#L72].

Such functions can be computed statically (reduced) at compile time and this 
happens successfully for the vast majority of them with the most relevant 
example being {{{}CURRENT_TIMESTAMP(){}}}.

However, constant reduction does not work for UNIX_TIMESTAMP in CBO:
{code:sql}
EXPLAIN CBO SELECT unix_timestamp();
{code}
{noformat}
HiveProject(_o__c0=[UNIX_TIMESTAMP()])
  HiveTableScan(table=[[_dummy_database, _dummy_table]], 
table:alias=[_dummy_table])
{noformat}
{code:sql}
EXPLAIN CBO SELECT unix_timestamp('2009-03-20', 'yyyy-MM-dd');
{code}
{noformat}
CBO PLAN:
HiveProject(_o__c0=[UNIX_TIMESTAMP(_UTF-16LE'2009-03-20':VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE", _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER 
SET "UTF-16LE")])
  HiveTableScan(table=[[_dummy_database, _dummy_table]], 
table:alias=[_dummy_table])
{noformat}
Observe that constant reduction works fine in the physical plan.
{code:sql}
EXPLAIN SELECT unix_timestamp();
{code}
{noformat}
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: _dummy_table
          Row Limit Per Split: 1
          Select Operator
            expressions: 1682411039L (type: bigint)
            outputColumnNames: _col0
            ListSink
{noformat}
Generally, we want to perform constant reduction as much as possible in CBO 
level cause it can affect expression pushdown in various storage handlers 
(HIVE-21388) but also predicate simplification/elimination.

Currently we fail to reduce {{UNIX_TIMESTAMP}} in CBO level cause the 
respective operator is marked as a 
[dynamicFunction|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java#L38]
 and the reduction rules in Calcite explicitly skip reduction [in this 
case|https://github.com/apache/calcite/blob/68b02dfd4af15bc94a91a0cd2a30655d04439555/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L1098].

As of Calcite 1.28.0, (CALCITE-2736) the reduction of dynamic functions becomes 
configurable so we may be able to exploit this feature. Alternatively, we will 
have to treat UNIX_TIMESTAMP in a similar fashion to CURRENT_TIMESTAMP and 
possibly rely on HiveSqlFunction.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to