Alessandro Solimando created HIVE-25758:
-------------------------------------------
Summary: OOM due to recursive application CBO rules
Key: HIVE-25758
URL: https://issues.apache.org/jira/browse/HIVE-25758
Project: Hive
Issue Type: Bug
Components: CBO, Query Planning
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Reproducing query is as follows:
{code:java}
create table test1 (act_nbr string);
create table test2 (month int);
create table test3 (mth int, con_usd double);
EXPLAIN
SELECT c.month,
d.con_usd
FROM
(SELECT
cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(),
'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month
FROM test1
UNION ALL
SELECT month
FROM test2
WHERE month = 202110) c
JOIN test3 d ON c.month = d.mth; {code}
Different plans are generated during the first CBO steps, last being:
{noformat}
2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main]
parse.CalcitePlanner: Plan after decorre
lation:
HiveProject(month=[$0], con_usd=[$2])
HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not
available])
HiveProject(month=[$0])
HiveUnion(all=[true])
HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
_UTF-16LE'yyyy-MM-d
d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7),
_UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-
16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER])
HiveTableScan(table=[[default, test1]], table:alias=[test1])
HiveProject(month=[$0])
HiveFilter(condition=[=($0, CAST(202110):INTEGER)])
HiveTableScan(table=[[default, test2]], table:alias=[test2])
HiveTableScan(table=[[default, test3]], table:alias=[d]){noformat}
Then, the HEP planner will keep expanding the filter expression with redundant
expressions, such as the following, where the identical CAST expression is
present multiple times:
{noformat}
rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
_UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1,
7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE",
_UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER,
CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
_UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1,
7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE",
_UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER,
202110)){noformat}
The problem seems to come from a bad interaction of at least
_HiveFilterProjectTransposeRule_ and
{_}HiveJoinPushTransitivePredicatesRule{_}, possibly more.
Most probably then UNION part can be removed and the reproducer be simplified
even further.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)