Stamatis Zampetakis created HIVE-27291:
------------------------------------------
Summary: Constant reduction in CBO does not work for UNIX_TIMESTAMP
Key: HIVE-27291
URL: https://issues.apache.org/jira/browse/HIVE-27291
Project: Hive
Issue Type: Improvement
Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
{{UNIX_TIMESTAMP}} function always returns the same output given the same input
for the duration of the query. In Hive terminology, this function is a
[runtimeConstant|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/udf/src/java/org/apache/hadoop/hive/ql/udf/UDFType.java#L72].
Such functions can be computed statically (reduced) at compile time and this
happens successfully for the vast majority of them with the most relevant
example being {{{}CURRENT_TIMESTAMP(){}}}.
However, constant reduction does not work for UNIX_TIMESTAMP in CBO:
{code:sql}
EXPLAIN CBO SELECT unix_timestamp();
{code}
{noformat}
HiveProject(_o__c0=[UNIX_TIMESTAMP()])
HiveTableScan(table=[[_dummy_database, _dummy_table]],
table:alias=[_dummy_table])
{noformat}
{code:sql}
EXPLAIN CBO SELECT unix_timestamp('2009-03-20', 'yyyy-MM-dd');
{code}
{noformat}
CBO PLAN:
HiveProject(_o__c0=[UNIX_TIMESTAMP(_UTF-16LE'2009-03-20':VARCHAR(2147483647)
CHARACTER SET "UTF-16LE", _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER
SET "UTF-16LE")])
HiveTableScan(table=[[_dummy_database, _dummy_table]],
table:alias=[_dummy_table])
{noformat}
Observe that constant reduction works fine in the physical plan.
{code:sql}
EXPLAIN SELECT unix_timestamp();
{code}
{noformat}
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: _dummy_table
Row Limit Per Split: 1
Select Operator
expressions: 1682411039L (type: bigint)
outputColumnNames: _col0
ListSink
{noformat}
Generally, we want to perform constant reduction as much as possible in CBO
level cause it can affect expression pushdown in various storage handlers
(HIVE-21388) but also predicate simplification/elimination.
Currently we fail to reduce {{UNIX_TIMESTAMP}} in CBO level cause the
respective operator is marked as a
[dynamicFunction|https://github.com/apache/hive/blob/59058c65457fb7ab9d8575a555034e6633962661/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java#L38]
and the reduction rules in Calcite explicitly skip reduction [in this
case|https://github.com/apache/calcite/blob/68b02dfd4af15bc94a91a0cd2a30655d04439555/core/src/main/java/org/apache/calcite/rel/rules/ReduceExpressionsRule.java#L1098].
As of Calcite 1.28.0, (CALCITE-2736) the reduction of dynamic functions becomes
configurable so we may be able to exploit this feature. Alternatively, we will
have to treat UNIX_TIMESTAMP in a similar fashion to CURRENT_TIMESTAMP and
possibly rely on HiveSqlFunction.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)