[jira] [Work logged] (HIVE-25589) SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1

ASF GitHub Bot (Jira) Mon, 23 May 2022 02:20:06 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-25589?focusedWorklogId=773375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773375
 ]


ASF GitHub Bot logged work on HIVE-25589:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/May/22 09:19
            Start Date: 23/May/22 09:19
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on code in PR #3266:
URL: https://github.com/apache/hive/pull/3266#discussion_r879213299


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -4779,34 +4779,29 @@ && isRegex(
               throw new 
SemanticException(SemanticAnalyzer.generateErrorMessage(obAST, error));
             }
           }
-          List<RexNode> originalInputRefs = 
Lists.transform(srcRel.getRowType().getFieldList(),
-              new Function<RelDataTypeField, RexNode>() {
-                @Override
-                public RexNode apply(RelDataTypeField input) {
-                  return new RexInputRef(input.getIndex(), input.getType());
-                }
-              });
-          originalRR = outputRR.duplicate();
-          for (int i = 0; i < inputRR.getColumnInfos().size(); i++) {
-            ColumnInfo colInfo = new 
ColumnInfo(inputRR.getColumnInfos().get(i));
-            String internalName = 
SemanticAnalyzer.getColumnInternalName(outputRR.getColumnInfos()
-                .size() + i);
-            colInfo.setInternalName(internalName);
-            // if there is any confict, then we do not generate it in the new 
select
-            // otherwise, we add it into the calciteColLst and generate the 
new select
-            if (!outputRR.putWithCheck(colInfo.getTabAlias(), 
colInfo.getAlias(), internalName,
-                colInfo)) {
-              LOG.trace("Column already present in RR. skipping.");
-            } else {
-              columnList.add(originalInputRefs.get(i));
-            }
-          }
+          originalRR = appendInputColumns(srcRel, columnList, outputRR, 
inputRR);
           outputRel = genSelectRelNode(columnList, outputRR, srcRel);
           // outputRel is the generated augmented select with extra unselected
           // columns, and originalRR is the original generated select
           return new Pair<RelNode, RowResolver>(outputRel, originalRR);
         } else {
-          outputRel = genSelectRelNode(columnList, outputRR, srcRel);
+          if (qbp.getQualifyExprForClause(dest) != null) {

Review Comment:
   Added a check and throw exception  if cbo is off.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 773375)
    Time Spent: 40m  (was: 0.5h)

> SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1
> -----------------------------------------------------------
>
>                 Key: HIVE-25589
>                 URL: https://issues.apache.org/jira/browse/HIVE-25589
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO, SQL
>    Affects Versions: 4.0.0
>            Reporter: Gopal Vijayaraghavan
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The insert queries which use a row_num()=1 function are inconvenient to write 
> or port from an existing workload, because there is no easy way to ignore a 
> column in this pattern.
> {code}
> INSERT INTO main_table 
> SELECT * from duplicated_table
> QUALIFY ROW_NUMER() OVER (PARTITION BY event_id) = 1;
> {code}
> needs to be rewritten into
> {code}
> INSERT INTO main_table
> select event_id, event_ts, event_attribute, event_metric1, event_metric2, 
> event_metric3, event_metric4, .., event_metric43 from 
> (select *, ROW_NUMBER() OVER (PARTITION BY event_id) as rnum from 
> duplicated_table)
> where rnum=1;
> {code}
> This is a time-consuming and error-prone rewrite (dealing with a messed up 
> order of columns between one source and dest table).
> An alternate rewrite would be to do the same or similar syntax using HAVING. 
> {code}
> INSERT INTO main_table 
> SELECT * from duplicated_table
> HAVING ROW_NUMER() OVER (PARTITION BY event_id) = 1;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Work logged] (HIVE-25589) SQL: Implement HAVING/QUALIFY predicates for ROW_NUMBER()=1

Reply via email to