[
https://issues.apache.org/jira/browse/HIVE-27858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817252#comment-17817252
]
Ryu Kobayashi commented on HIVE-27858:
--------------------------------------
[~okumin] and I investigated and found that the root cause was combinatorial
explosion of execution plans. [^query.sql] has 20 nested CTEs, and after
expansion, 40,000 Operators, including 4,000 TableScanOperators, are generated.
The final memory consumption is also huge, since each has a copied String,
HashMap, and ColumnInfo. The same issue happens even if it reduce the number of
columns.
heap dump:
!image-2024-02-14-17-30-59-359.png|width=482,height=213!
Operators:
!image-2024-02-14-17-31-32-714.png|width=488,height=21!
We believe that CTE materialization is currently the only effective way. This
query will succeed if we set `hive.optimize.cte.materialize.threshold` and
`hive.optimize.cte.materialize.full.aggregate.only=false`. However, due to a
known bug(HIVE-24167) in CTE materialization,
`hive.cbo.fallback.strategy=NEVER` will fail. Therefore, we need to disable CBO
or set `hive.cbo.fallback.strategy=CONSERVATIVE`. And also, re-investigate, we
found that this issue also happens in Hive 2 when
`hive.optimize.cte.materialize.threshold=-1`. So, it is considered not to be
degraded.
In summary, this query will succeed even in Hive 4 if I set the following
parameters: Therefore, I think that this ticket itself can be closed without
fix. What do you think?
{code:java}
set hive.optimize.cte.materialize.threshold=3;
set hive.optimize.cte.materialize.full.aggregate.only=false;
set hive.cbo.fallback.strategy=CONSERVATIVE ;{code}
> OOM happens when selecting many columns and JOIN.
> --------------------------------------------------
>
> Key: HIVE-27858
> URL: https://issues.apache.org/jira/browse/HIVE-27858
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 4.0.0-beta-1
> Reporter: Ryu Kobayashi
> Assignee: okumin
> Priority: Critical
> Labels: hive-4.0.0-must
> Fix For: Not Applicable
>
> Attachments: HIVE-27858.full.q, HIVE-27858.less.columns.q, ddl.sql,
> query.sql
>
>
> OOM happens when executing [^query.sql] using a table in [^ddl.sql]. These
> did not happen in Hive 2 previously.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)