[
https://issues.apache.org/jira/browse/HIVE-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855299#comment-16855299
]
Vineet Garg commented on HIVE-21799:
------------------------------------
The existing logic to get to {{ExprNode}} for the given column name seems wrong.
{code:java}
ColumnInfo columnInfo =
parentOfRS.getSchema().getColumnInfo(internalColName);
{code}
Above gets to the column info of {{parentOfRS}}, assuming {{parentOfRS}} is
outputting a column named {{internalColName}}
{code:java}
ExprNodeDesc exprNode = null;
if ( parentOfRS.getColumnExprMap() != null) {
exprNode = parentOfRS.getColumnExprMap().get(internalColName).clone();
} else {
exprNode = new ExprNodeColumnDesc(columnInfo);
}
{code}
But this logic is looking for the same column {{internalColName}} in
{{columnExprMap}} which is a mapping of {{parentOfRS's}} input column name to
whatever corresponding expression {{parentOfRS}} will emit. This will work only
if {{parentRS}} do not change the input column and output it as it is.
Assuming that {{internalColName}} refers to the column coming out of
{{parentOfRS}} then this should just be
{code:java}
exprNode = new ExprNodeColumnDesc(columnInfo);
{code}
I believe this change will also fix the issue here. In fact it should go ahead
and create semi join instead of returning.
> NullPointerException in DynamicPartitionPruningOptimization, when join key is
> on aggregation column
> ---------------------------------------------------------------------------------------------------
>
> Key: HIVE-21799
> URL: https://issues.apache.org/jira/browse/HIVE-21799
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: Jason Dere
> Assignee: Jason Dere
> Priority: Major
> Attachments: HIVE-21799.1.patch, HIVE-21799.2.patch,
> HIVE-21799.3.patch
>
>
> Following table/query results in NPE:
> {noformat}
> create table tez_no_dynpart_hashjoin_on_agg(id int, outcome string, eventid
> int) stored as orc;
> explain select a.id, b.outcome from (select id, max(eventid) as event_id_max
> from tez_no_dynpart_hashjoin_on_agg group by id) a
> LEFT OUTER JOIN tez_no_dynpart_hashjoin_on_agg b
> on a.event_id_max = b.eventid;
> {noformat}
> Stack trace:
> {noformat}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:608)
> at
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:239)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.runDynamicPartitionPruning(TezCompiler.java:584)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:165)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
> at
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
> at
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
> at
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
> at
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
> at
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:182)
> at
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
> at
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)