[
https://issues.apache.org/jira/browse/HIVE-27858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794895#comment-17794895
]
John Sherman commented on HIVE-27858:
-------------------------------------
I did some preliminary investigation here -
if I remove all the columns except the ones being joined/operated on the query
it gets past the point of generating the initial CBO plan -
this plan is 307402 lines long. (I had to CTRL-C because it was taking over 10+
minutes)
{code:java}
apache/hive master > cat hive.plan.log | wc -l
307402 {code}
I thought maybe the plan is so complex due to the recursive withs and that the
CTEs are not being materialized, so I set:
{code:java}
set hive.optimize.cte.materialize.full.aggregate.only=false;
{code}
in hopes that it would generate a better plan.
It then fails much quicker with:
{code:java}
java.lang.RuntimeException: equivalence mapping violation
{code}
which is https://issues.apache.org/jira/browse/HIVE-24167
If I mechanically fix HIVE-24167 (I don't have the domain knowledge, to know if
the change is correct in all case) by:
{code:java}
diff --git
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
index d7744587e6..5338bb61e0 100644
---
a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
+++
b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java
@@ -3273,7 +3273,7 @@ private static Statistics applyRuntimeStats(Context
context, Statistics stats, O
PlanMapper pm = context.getPlanMapper();
OpTreeSignature treeSig = pm.getSignatureOf(op);
- pm.link(op, treeSig);
+ pm.merge(op, treeSig);
StatsSource statsSource = context.getStatsSource();
if (!statsSource.canProvideStatsFor(op.getClass())) {
diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
index 0823b6d9ba..0de4e83624 100644
--- a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
+++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
@@ -3767,8 +3767,7 @@ private Operator genFilterPlan(QB qb, ASTNode condn,
Operator input, boolean use
Operator output = putOpInsertMap(OperatorFactory.getAndMakeChild(
new FilterDesc(filterCond, false), new RowSchema(
inputRR.getColumnInfos()), input), inputRR);
-
- ctx.getPlanMapper().link(condn, output);
+ ctx.getPlanMapper().merge(condn, output);
LOG.debug("Created Filter Plan for {} row schema: {}", qb.getId(),
inputRR.toString());
return output;
{code}
It does successfully compile and it also successfully compiles the query with
all the columns.
I likely do not have the cycles to further the investigation and build up the
domain knowledge to fix HIVE-24167 (or to investigate why the plan explodes
without "hive.optimize.cte.materialize.full.aggregate.only").
I've attached the test files for both less column version and full column
version.
However I will highlight it to some folks to see if they can spend some time
soon to investigate this JIRA and HIVE-24167.
> OOM happens when selecting many columns and JOIN.
> --------------------------------------------------
>
> Key: HIVE-27858
> URL: https://issues.apache.org/jira/browse/HIVE-27858
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 4.0.0-beta-1
> Reporter: Ryu Kobayashi
> Assignee: John Sherman
> Priority: Critical
> Labels: hive-4.0.0-must
> Attachments: ddl.sql, query.sql
>
>
> OOM happens when executing [^query.sql] using a table in [^ddl.sql]. These
> did not happen in Hive 2 previously.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)