[jira] [Work logged] (HIVE-26006) TopNKey and PTF with more than one column is failing with IOBE

ASF GitHub Bot (Jira) Thu, 28 Apr 2022 03:41:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26006?focusedWorklogId=763419&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-763419
 ]


ASF GitHub Bot logged work on HIVE-26006:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Apr/22 10:40
            Start Date: 28/Apr/22 10:40
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on code in PR #3082:
URL: https://github.com/apache/hive/pull/3082#discussion_r860699101


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java:
##########
@@ -244,13 +223,35 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator 
topNKey) throws Semant
             reduceSinkDesc.getColumnExprMap(),
             reduceSinkDesc.getOrder(),
             reduceSinkDesc.getNullOrder());
+
+    pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator);
+  }
+
+  private <T extends AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, 
Operator<T> operator)
+          throws SemanticException {
+
+    pushDownThrough(commonKeyPrefix, topNKey, operator, operator);
+  }
+
+  private <TDesc extends AbstractOperatorDesc, TParentDesc extends 
AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
+          Operator<TDesc> join, Operator<TParentDesc> reduceSinkOperator)
+          throws SemanticException {
+
+    final TopNKeyDesc topNKeyDesc = topNKey.getConf();
     if (commonKeyPrefix.isEmpty() || commonKeyPrefix.size() == 
topNKeyDesc.getPartitionKeyColumns().size()) {
       return;
     }
 
+    final TopNKeyDesc newTopNKeyDesc = topNKeyDesc.combine(commonKeyPrefix);
+    if (newTopNKeyDesc.getKeyColumns().size() > 0 &&
+            newTopNKeyDesc.getKeyColumns().size() <= 
newTopNKeyDesc.getPartitionKeyColumns().size()) {

Review Comment:
   Do we need to create the new `TopNKeyDesc` to do this check? Don't we have 
already all the info?
   
   Can you add more comments on why we need to bail out.
   
   Do we have test coverage for this case. In other words does existing test 
enter this new if statement?



##########
ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java:
##########
@@ -252,7 +252,8 @@ public TopNKeyDescExplainVectorization 
getTopNKeyVectorization() {
   public TopNKeyDesc combine(CommonKeyPrefix commonKeyPrefix) {
     return new TopNKeyDesc(topN, commonKeyPrefix.getMappedOrder(),
             commonKeyPrefix.getMappedNullOrder(), 
commonKeyPrefix.getMappedColumns(),
-            commonKeyPrefix.getMappedColumns().subList(0, 
partitionKeyColumns.size()),
+            commonKeyPrefix.getMappedColumns()
+                    .subList(0, Math.min(partitionKeyColumns.size(), 
commonKeyPrefix.getMappedColumns().size())),

Review Comment:
   This is the main part of the fix right? The rest is mostly refactoring to 
take advantage of the new bail-out condition?



##########
ql/src/test/queries/clientpositive/ptf_tnk.q:
##########
@@ -0,0 +1,22 @@
+CREATE EXTERNAL TABLE t1(

Review Comment:
   Would it be possible to also load some data and verify that the results of 
the query are correct?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java:
##########
@@ -244,13 +223,35 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator 
topNKey) throws Semant
             reduceSinkDesc.getColumnExprMap(),
             reduceSinkDesc.getOrder(),
             reduceSinkDesc.getNullOrder());
+
+    pushDownThrough(commonKeyPrefix, topNKey, join, reduceSinkOperator);
+  }
+
+  private <T extends AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey, 
Operator<T> operator)
+          throws SemanticException {
+
+    pushDownThrough(commonKeyPrefix, topNKey, operator, operator);
+  }
+
+  private <TDesc extends AbstractOperatorDesc, TParentDesc extends 
AbstractOperatorDesc> void pushDownThrough(
+          CommonKeyPrefix commonKeyPrefix, TopNKeyOperator topNKey,
+          Operator<TDesc> join, Operator<TParentDesc> reduceSinkOperator)

Review Comment:
   Are the operators here strictly a join and reduce sink? From the code I get 
the impression that there are more options. Should we pick more descriptive 
names?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 763419)
    Time Spent: 20m  (was: 10m)

> TopNKey and PTF with more than one column is failing with IOBE
> --------------------------------------------------------------
>
>                 Key: HIVE-26006
>                 URL: https://issues.apache.org/jira/browse/HIVE-26006
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Naresh P R
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> java.lang.IndexOutOfBoundsException: toIndex = 2
> at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014)
> at java.util.ArrayList.subList(ArrayList.java:1006)
> at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201)
> at 
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162)
> at 
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76)
> at 
> org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305)
> at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173)
> at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283)
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215){code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-26006) TopNKey and PTF with more than one column is failing with IOBE

Reply via email to