[
https://issues.apache.org/jira/browse/HIVE-26006?focusedWorklogId=763419&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763419
]
ASF GitHub Bot logged work on HIVE-26006:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Apr/22 10:40
Start Date: 28/Apr/22 10:40
Worklog Time Spent: 10m
Work Description: zabetak commented on code in PR #3082:
URL: https://github.com/apache/hive/pull/3082#discussion_r860699101
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java:
##########
@@ -244,13 +223,35 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator
topNKey) throws Semant
reduceSinkDesc.getColumnExprMap(),
reduceSinkDesc.getOrder(),
reduceSinkDesc.getNullOrder());
+
+ pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator);
+ }
+
+ private <T extends AbstractOperatorDesc> void pushDownThrough(
+ CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
Operator<T> operator)
+ throws SemanticException {
+
+ pushDownThrough(commonKeyPrefix, topNKey, operator, operator);
+ }
+
+ private <TDesc extends AbstractOperatorDesc, TParentDesc extends
AbstractOperatorDesc> void pushDownThrough(
+ CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
+ Operator<TDesc> join, Operator<TParentDesc> reduceSinkOperator)
+ throws SemanticException {
+
+ final TopNKeyDesc topNKeyDesc = topNKey.getConf();
if (commonKeyPrefix.isEmpty() || commonKeyPrefix.size() ==
topNKeyDesc.getPartitionKeyColumns().size()) {
return;
}
+ final TopNKeyDesc newTopNKeyDesc = topNKeyDesc.combine(commonKeyPrefix);
+ if (newTopNKeyDesc.getKeyColumns().size() > 0 &&
+ newTopNKeyDesc.getKeyColumns().size() <=
newTopNKeyDesc.getPartitionKeyColumns().size()) {
Review Comment:
Do we need to create the new `TopNKeyDesc` to do this check? Don't we have
already all the info?
Can you add more comments on why we need to bail out.
Do we have test coverage for this case. In other words does existing test
enter this new if statement?
##########
ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java:
##########
@@ -252,7 +252,8 @@ public TopNKeyDescExplainVectorization
getTopNKeyVectorization() {
public TopNKeyDesc combine(CommonKeyPrefix commonKeyPrefix) {
return new TopNKeyDesc(topN, commonKeyPrefix.getMappedOrder(),
commonKeyPrefix.getMappedNullOrder(),
commonKeyPrefix.getMappedColumns(),
- commonKeyPrefix.getMappedColumns().subList(0,
partitionKeyColumns.size()),
+ commonKeyPrefix.getMappedColumns()
+ .subList(0, Math.min(partitionKeyColumns.size(),
commonKeyPrefix.getMappedColumns().size())),
Review Comment:
This is the main part of the fix right? The rest is mostly refactoring to
take advantage of the new bail-out condition?
##########
ql/src/test/queries/clientpositive/ptf_tnk.q:
##########
@@ -0,0 +1,22 @@
+CREATE EXTERNAL TABLE t1(
Review Comment:
Would it be possible to also load some data and verify that the results of
the query are correct?
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java:
##########
@@ -244,13 +223,35 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator
topNKey) throws Semant
reduceSinkDesc.getColumnExprMap(),
reduceSinkDesc.getOrder(),
reduceSinkDesc.getNullOrder());
+
+ pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator);
+ }
+
+ private <T extends AbstractOperatorDesc> void pushDownThrough(
+ CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
Operator<T> operator)
+ throws SemanticException {
+
+ pushDownThrough(commonKeyPrefix, topNKey, operator, operator);
+ }
+
+ private <TDesc extends AbstractOperatorDesc, TParentDesc extends
AbstractOperatorDesc> void pushDownThrough(
+ CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
+ Operator<TDesc> join, Operator<TParentDesc> reduceSinkOperator)
Review Comment:
Are the operators here strictly a join and reduce sink? From the code I get
the impression that there are more options. Should we pick more descriptive
names?
Issue Time Tracking
-------------------
Worklog Id: (was: 763419)
Time Spent: 20m (was: 10m)
> TopNKey and PTF with more than one column is failing with IOBE
> --------------------------------------------------------------
>
> Key: HIVE-26006
> URL: https://issues.apache.org/jira/browse/HIVE-26006
> Project: Hive
> Issue Type: Bug
> Reporter: Naresh P R
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> {code:java}
> java.lang.IndexOutOfBoundsException: toIndex = 2
> at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014)
> at java.util.ArrayList.subList(ArrayList.java:1006)
> at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201)
> at
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162)
> at
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76)
> at
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173)
> at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215){code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)