[jira] [Commented] (HIVE-24167) TPC-DS query 14 fails while generating plan for the filter

okumin (Jira) Wed, 20 Mar 2024 06:09:04 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828939#comment-17828939
 ]


okumin commented on HIVE-24167:
-------------------------------

[~zabetak]

No, I haven't. I am guessing the problem is a little more difficult than that. 
The final purpose of PlanMapper is to propagate runtime stats into all 
equivalent Calcite RelNodes and Hive Operators across application attempts. 
[The propagation can happen even across queries through HS2 or 
HMS|https://github.com/apache/hive/blob/rel/release-4.0.0-beta-1/ql/src/java/org/apache/hadoop/hive/ql/plan/mapper/StatsSources.java#L52-L61].

We bind runtime stats to RelNodes or Operators in [a Hive 
hook|https://github.com/apache/hive/blob/rel/release-4.0.0-beta-1/ql/src/java/org/apache/hadoop/hive/ql/stats/OperatorStatsReaderHook.java#L77].
 So, the lifetime of Context/PlanMapper is required to equal that of a Hive 
query. It is longer than the lifetime of a materialized CTE = Tez DAG.

So, we could need to make a small modification if we apply that approach. Some 
of my ideas are here.
 * To make Context : PlanMapper = 1 : N. We create a new PlanMapper per 
materialized CTE, and retain all mappers during the entire query. When 
OperatorStatsReaderHook links the runtime stats, it will try to propagate stats 
to all PlanMappers(maybe via signatures, or Operator ids if it is difficult)
 * [To tag the name of materialized CTEs to each entry in 
PlanMapper|https://gist.github.com/okumin/b111fe0a911507bdf6a7204f49b9cb72#give-separate-namespaces-to-each-cte],
 keeping Context : PlanMapper = 1 : 1. The basic idea is the same as the first 
one

I expect that either approach prevents SemanticAnalyzers from over-linking 
RelNodes or Operators across materialized CTEs at compile-time and allows 
OperatorStatsReaderHook to load stats with all Operators at the end of a query.

To be honest, I can't present 100% confidence or evidence as the related codes 
are difficult. I will try it if the above approaches will likely make the most 
sense to us.

> TPC-DS query 14 fails while generating plan for the filter
> ----------------------------------------------------------
>
>                 Key: HIVE-24167
>                 URL: https://issues.apache.org/jira/browse/HIVE-24167
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>            Reporter: Stamatis Zampetakis
>            Assignee: okumin
>            Priority: Major
>              Labels: hive-4.1.0-must, pull-request-available
>
> TPC-DS query 14 (cbo_query14.q and query4.q) fail with NPE on the metastore 
> with the partitioned TPC-DS 30TB dataset while generating the plan for the 
> filter.
> The problem can be reproduced using the PR in HIVE-23965.
> The current stacktrace shows that the NPE appears while trying to display the 
> debug message but even if this line didn't exist it would fail again later on.
> {noformat}
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10867)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3375)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3473)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10819)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12417)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12519)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>         at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301)
>         at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220)
>         at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:231)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:203)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:129)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:355)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:740)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:710)
>         at 
> org.apache.hadoop.hive.cli.control.CorePerfCliDriver.runTest(CorePerfCliDriver.java:103)
>         at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
>         at 
> org.apache.hadoop.hive.cli.TestTezTPCDS30TBPerfCliDriver.testCliDriver(TestTezTPCDS30TBPerfCliDriver.java:83)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-24167) TPC-DS query 14 fails while generating plan for the filter

Reply via email to