[ https://issues.apache.org/jira/browse/HIVE-27858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794895#comment-17794895 ]
John Sherman commented on HIVE-27858: ------------------------------------- I did some preliminary investigation here - if I remove all the columns except the ones being joined/operated on the query it gets past the point of generating the initial CBO plan - this plan is 307402 lines long. (I had to CTRL-C because it was taking over 10+ minutes) {code:java} apache/hive master > cat hive.plan.log | wc -l 307402 {code} I thought maybe the plan is so complex due to the recursive withs and that the CTEs are not being materialized, so I set: {code:java} set hive.optimize.cte.materialize.full.aggregate.only=false; {code} in hopes that it would generate a better plan. It then fails much quicker with: {code:java} java.lang.RuntimeException: equivalence mapping violation {code} which is https://issues.apache.org/jira/browse/HIVE-24167 If I mechanically fix HIVE-24167 (I don't have the domain knowledge, to know if the change is correct in all case) by: {code:java} diff --git a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java index d7744587e6..5338bb61e0 100644 --- a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java +++ b/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java @@ -3273,7 +3273,7 @@ private static Statistics applyRuntimeStats(Context context, Statistics stats, O PlanMapper pm = context.getPlanMapper(); OpTreeSignature treeSig = pm.getSignatureOf(op); - pm.link(op, treeSig); + pm.merge(op, treeSig); StatsSource statsSource = context.getStatsSource(); if (!statsSource.canProvideStatsFor(op.getClass())) { diff --git a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java index 0823b6d9ba..0de4e83624 100644 --- a/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java +++ b/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java @@ -3767,8 +3767,7 @@ private Operator genFilterPlan(QB qb, ASTNode condn, Operator input, boolean use Operator output = putOpInsertMap(OperatorFactory.getAndMakeChild( new FilterDesc(filterCond, false), new RowSchema( inputRR.getColumnInfos()), input), inputRR); - - ctx.getPlanMapper().link(condn, output); + ctx.getPlanMapper().merge(condn, output); LOG.debug("Created Filter Plan for {} row schema: {}", qb.getId(), inputRR.toString()); return output; {code} It does successfully compile and it also successfully compiles the query with all the columns. I likely do not have the cycles to further the investigation and build up the domain knowledge to fix HIVE-24167 (or to investigate why the plan explodes without "hive.optimize.cte.materialize.full.aggregate.only"). I've attached the test files for both less column version and full column version. However I will highlight it to some folks to see if they can spend some time soon to investigate this JIRA and HIVE-24167. > OOM happens when selecting many columns and JOIN. > -------------------------------------------------- > > Key: HIVE-27858 > URL: https://issues.apache.org/jira/browse/HIVE-27858 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 4.0.0-beta-1 > Reporter: Ryu Kobayashi > Assignee: John Sherman > Priority: Critical > Labels: hive-4.0.0-must > Attachments: ddl.sql, query.sql > > > OOM happens when executing [^query.sql] using a table in [^ddl.sql]. These > did not happen in Hive 2 previously. -- This message was sent by Atlassian Jira (v8.20.10#820010)