[
https://issues.apache.org/jira/browse/HIVE-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828939#comment-17828939
]
okumin commented on HIVE-24167:
-------------------------------
[~zabetak]
No, I haven't. I am guessing the problem is a little more difficult than that.
The final purpose of PlanMapper is to propagate runtime stats into all
equivalent Calcite RelNodes and Hive Operators across application attempts.
[The propagation can happen even across queries through HS2 or
HMS|https://github.com/apache/hive/blob/rel/release-4.0.0-beta-1/ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java#L52-L61].
We bind runtime stats to RelNodes or Operators in [a Hive
hook|https://github.com/apache/hive/blob/rel/release-4.0.0-beta-1/ql/src/java/org/apache/hadoop/hive/ql/stats/OperatorStatsReaderHook.java#L77].
So, the lifetime of Context/PlanMapper is required to equal that of a Hive
query. It is longer than the lifetime of a materialized CTE = Tez DAG.
So, we could need to make a small modification if we apply that approach. Some
of my ideas are here.
* To make Context : PlanMapper = 1 : N. We create a new PlanMapper per
materialized CTE, and retain all mappers during the entire query. When
OperatorStatsReaderHook links the runtime stats, it will try to propagate stats
to all PlanMappers(maybe via signatures, or Operator ids if it is difficult)
* [To tag the name of materialized CTEs to each entry in
PlanMapper|https://gist.github.com/okumin/b111fe0a911507bdf6a7204f49b9cb72#give-separate-namespaces-to-each-cte],
keeping Context : PlanMapper = 1 : 1. The basic idea is the same as the first
one
I expect that either approach prevents SemanticAnalyzers from over-linking
RelNodes or Operators across materialized CTEs at compile-time and allows
OperatorStatsReaderHook to load stats with all Operators at the end of a query.
To be honest, I can't present 100% confidence or evidence as the related codes
are difficult. I will try it if the above approaches will likely make the most
sense to us.
> TPC-DS query 14 fails while generating plan for the filter
> ----------------------------------------------------------
>
> Key: HIVE-24167
> URL: https://issues.apache.org/jira/browse/HIVE-24167
> Project: Hive
> Issue Type: Sub-task
> Components: CBO
> Reporter: Stamatis Zampetakis
> Assignee: okumin
> Priority: Major
> Labels: hive-4.1.0-must, pull-request-available
>
> TPC-DS query 14 (cbo_query14.q and query4.q) fail with NPE on the metastore
> with the partitioned TPC-DS 30TB dataset while generating the plan for the
> filter.
> The problem can be reproduced using the PR in HIVE-23965.
> The current stacktrace shows that the NPE appears while trying to display the
> debug message but even if this line didn't exist it would fail again later on.
> {noformat}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10867)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3375)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3473)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10819)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12417)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12519)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
> at
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
> at
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:710)
> at
> org.apache.hadoop.hive.cli.control.CorePerfCliDriver.runTest(CorePerfCliDriver.java:103)
> at
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
> at
> org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver.testCliDriver(TestTezTPCDS30TBPerfCliDriver.java:83)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)