[jira] [Commented] (HIVE-27858) OOM happens when selecting many columns and JOIN.

John Sherman (Jira) Fri, 08 Dec 2023 17:10:19 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-27858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794895#comment-17794895
 ]


John Sherman commented on HIVE-27858:
-------------------------------------

I did some preliminary investigation here -
if I remove all the columns except the ones being joined/operated on the query 
it gets past the point of generating the initial CBO plan -
this plan is 307402 lines long. (I had to CTRL-C because it was taking over 10+ 
minutes)
{code:java}
apache/hive master > cat hive.plan.log | wc -l
  307402 {code}
I thought maybe the plan is so complex due to the recursive withs and that the 
CTEs are not being materialized, so I set:
{code:java}
 set hive.optimize.cte.materialize.full.aggregate.only=false;
{code}
in hopes that it would generate a better plan.
It then fails much quicker with:
{code:java}
 java.lang.RuntimeException: equivalence mapping violation
{code}
which is https://issues.apache.org/jira/browse/HIVE-24167

If I mechanically fix HIVE-24167 (I don't have the domain knowledge, to know if 
the change is correct in all case) by:
{code:java}
diff --git 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
 
b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
index d7744587e6..5338bb61e0 100644
--- 
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
@@ -3273,7 +3273,7 @@ private static Statistics applyRuntimeStats(Context 
context, Statistics stats, O
 
     PlanMapper pm = context.getPlanMapper();
     OpTreeSignature treeSig = pm.getSignatureOf(op);
-    pm.link(op, treeSig);
+    pm.merge(op, treeSig);
 
     StatsSource statsSource = context.getStatsSource();
     if (!statsSource.canProvideStatsFor(op.getClass())) {
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
index 0823b6d9ba..0de4e83624 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
@@ -3767,8 +3767,7 @@ private Operator genFilterPlan(QB qb, ASTNode condn, 
Operator input, boolean use
     Operator output = putOpInsertMap(OperatorFactory.getAndMakeChild(
         new FilterDesc(filterCond, false), new RowSchema(
             inputRR.getColumnInfos()), input), inputRR);
-
-    ctx.getPlanMapper().link(condn, output);
+    ctx.getPlanMapper().merge(condn, output);
 
     LOG.debug("Created Filter Plan for {} row schema: {}", qb.getId(), 
inputRR.toString());
     return output;
{code}
It does successfully compile and it also successfully compiles the query with 
all the columns.

I likely do not have the cycles to further the investigation and build up the 
domain knowledge to fix HIVE-24167 (or to investigate why the plan explodes 
without "hive.optimize.cte.materialize.full.aggregate.only").
I've attached the test files for both less column version and full column 
version.

However I will highlight it to some folks to see if they can spend some time 
soon to investigate this JIRA and HIVE-24167.

> OOM happens when selecting many columns and  JOIN.
> --------------------------------------------------
>
>                 Key: HIVE-27858
>                 URL: https://issues.apache.org/jira/browse/HIVE-27858
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 4.0.0-beta-1
>            Reporter: Ryu Kobayashi
>            Assignee: John Sherman
>            Priority: Critical
>              Labels: hive-4.0.0-must
>         Attachments: ddl.sql, query.sql
>
>
> OOM happens when executing [^query.sql] using a table in [^ddl.sql]. These 
> did not happen in Hive 2 previously.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27858) OOM happens when selecting many columns and JOIN.

Reply via email to