[
https://issues.apache.org/jira/browse/HIVE-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850168#comment-16850168
]
Jason Dere commented on HIVE-21799:
-----------------------------------
It appears that the GroupByDesc does not seem to have the column mapping
information for the join key, in the case that the join key is an aggregation
column (it works when the join key is one of the key columns from the GroupBy).
Simple fix is to skip dynamic semijoin reduction optimization if the column
mapping for the join key cannot be found in the GroupBy (parentOfRS)
> NullPointerException in DynamicPartitionPruningOptimization, when join key is
> on aggregation column
> ---------------------------------------------------------------------------------------------------
>
> Key: HIVE-21799
> URL: https://issues.apache.org/jira/browse/HIVE-21799
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: Jason Dere
> Assignee: Jason Dere
> Priority: Major
>
> Following table/query results in NPE:
> {noformat}
> create table tez_no_dynpart_hashjoin_on_agg(id int, outcome string, eventid
> int) stored as orc;
> explain select a.id, b.outcome from (select id, max(eventid) as event_id_max
> from tez_no_dynpart_hashjoin_on_agg group by id) a
> LEFT OUTER JOIN tez_no_dynpart_hashjoin_on_agg b
> on a.event_id_max = b.eventid;
> {noformat}
> Stack trace:
> {noformat}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:608)
> at
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:239)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.runDynamicPartitionPruning(TezCompiler.java:584)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:165)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
> at
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
> at
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
> at
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
> at
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
> at
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:182)
> at
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
> at
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)