[jira] [Work logged] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

ASF GitHub Bot (Jira) Mon, 20 Jul 2020 21:28:00 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-23817?focusedWorklogId=461405&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-461405
 ]


ASF GitHub Bot logged work on HIVE-23817:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jul/20 04:26
            Start Date: 21/Jul/20 04:26
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #1228:
URL: https://github.com/apache/hive/pull/1228#discussion_r457824052



##########
File path: parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerParent.g
##########
@@ -471,6 +471,8 @@ Number
     (Digit)+ ( DOT (Digit)* (Exponent)? | Exponent)?
     ;
 
+PKFK_JOIN: 'PKFK_JOIN';

Review comment:
       Can we prefix with `KW_` and move above with rest of keywords?

##########
File path: ql/src/test/queries/clientpositive/topnkey_inner_join.q
##########
@@ -0,0 +1,50 @@
+drop table if exists customer;
+drop table if exists orders;
+
+create table customer (id int, name string, email string);
+create table orders (customer_id int not null enforced, amount int);
+
+alter table customer add constraint pk_customer_id primary key (id) disable 
novalidate rely;
+alter table orders add constraint fk_order_customer_id foreign key 
(customer_id) references customer(id) disable novalidate rely;
+
+insert into customer values
+  (4, 'Heisenberg', '[email protected]'),
+  (3, 'Smith', '[email protected]'),
+  (2, 'Jones', '[email protected]'),
+  (1, 'Robinson', '[email protected]');
+
+insert into orders values
+  (2, 200),
+  (3, 40),
+  (1, 100),
+  (1, 50),
+  (3, 30);
+
+set hive.optimize.topnkey=true;
+set hive.optimize.limittranspose=false;
+
+select 'positive: order by columns are coming from child table';
+-- FIXME: explain select * from customer join orders on customer.id = 
orders.customer_id order by customer.id limit 3;

Review comment:
       I see this example is below. Can FIXME be removed?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/ASTConverter.java
##########
@@ -442,6 +450,40 @@ private QueryBlockInfo convertSource(RelNode r) throws 
CalciteSemanticException
     return new QueryBlockInfo(s, ast);
   }
 
+  /**
+   * Add PK-FK join information to the AST as a query hint
+   * @param ast
+   * @param join
+   * @param swapSides whether the left and right input of the join is swapped
+   */
+  private void addPkFkInfoToAST(ASTNode ast, Join join, boolean swapSides) {
+    List<RexNode> joinFilters = new 
ArrayList<>(RelOptUtil.conjunctions(join.getCondition()));
+    RelMetadataQuery mq = join.getCluster().getMetadataQuery();
+    HiveRelOptUtil.PKFKJoinInfo rightInputResult =
+            HiveRelOptUtil.extractPKFKJoin(join, joinFilters, false, mq);
+    HiveRelOptUtil.PKFKJoinInfo leftInputResult =
+            HiveRelOptUtil.extractPKFKJoin(join, joinFilters, true, mq);
+    // Add the fkJoinIndex (0=left, 1=right, if swapSides is false) to the AST
+    // check if the nonFK side is filtered
+    if (leftInputResult.isPkFkJoin && 
leftInputResult.additionalPredicates.isEmpty()) {
+      RelNode nonFkInput = join.getRight();
+      ast.addChild(pkFkHint(swapSides ? 1 : 0, 
HiveRelOptUtil.isRowFilteringPlan(mq, nonFkInput)));
+    } else if (rightInputResult.isPkFkJoin && 
rightInputResult.additionalPredicates.isEmpty()) {
+      RelNode nonFkInput = join.getLeft();
+      ast.addChild(pkFkHint(swapSides ? 0 : 1, 
HiveRelOptUtil.isRowFilteringPlan(mq, nonFkInput)));
+    }
+  }
+
+  private ASTNode pkFkHint(int fkTableIndex, boolean nonFkSideIsFiltered) {
+    ParseDriver parseDriver = new ParseDriver();
+    try {
+      return parseDriver.parseHint(String.format("PKFK_JOIN(%d, %s)",
+              fkTableIndex, nonFkSideIsFiltered ? NON_FK_FILTERED : 
"notFiltered"));

Review comment:
       Naming (NON_FK_FILTERED : "notFiltered") is a bit confusing. We can 
simplify to NON_FK_FILTERED vs NON_FK_NOT_FILTERED? Create String in converter 
for both (or enum).

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java
##########
@@ -255,6 +260,88 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator 
topNKey) throws Semant
     }
   }
 
+  private void pushdownInnerJoin(TopNKeyOperator topNKey, int 
fkJoinInputIndex, boolean nonFkSideIsFiltered) throws SemanticException {
+    TopNKeyDesc topNKeyDesc = topNKey.getConf();
+    CommonJoinOperator<? extends JoinDesc> join =
+            (CommonJoinOperator<? extends JoinDesc>) 
topNKey.getParentOperators().get(0);
+    List<Operator<? extends OperatorDesc>> joinInputs = 
join.getParentOperators();
+    ReduceSinkOperator fkJoinInput = (ReduceSinkOperator) 
joinInputs.get(fkJoinInputIndex);
+    if (nonFkSideIsFiltered) {
+      LOG.debug("Not pushing {} through {} as non FK side of the join is 
filtered", topNKey.getName(), join.getName());
+      return;
+    }
+    // Check column origins:
+    //  1. If all OrderBy columns are coming from the child (FK) table:
+    //    -> move TopNKeyOperator
+    //  2. If the first n OrderBy columns are coming from the child (FK) table:
+    //    -> copy TopNKeyOperator with the first n columns, and leave the 
original in place
+    int prefixLength = keyColumnPrefixLength(join, topNKey, fkJoinInputIndex, 
topNKey.getConf().getKeyColumns());
+    if (prefixLength == 0) {
+      LOG.debug("Not pushing {} through {} as common key column prefix length 
is 0", topNKey.getName(), join.getName());
+      return;
+    }
+    LOG.debug("Pushing a copy of {} through {} and {}",
+            topNKey.getName(), join.getName(), fkJoinInput.getName());
+    TopNKeyDesc newTopNKeyDesc = topNKeyDesc.withKeyColumns(prefixLength);
+    newTopNKeyDesc.setKeyColumns(remapColumns(join, fkJoinInput, 
newTopNKeyDesc.getKeyColumns()));
+    pushdown(copyDown(fkJoinInput, newTopNKeyDesc));
+    if (topNKeyDesc.getKeyColumns().size() == prefixLength) {
+      LOG.debug("Removing {} above {}", topNKey.getName(), join.getName());
+      join.removeChildAndAdoptItsChildren(topNKey);
+    }
+  }
+
+  private int keyColumnPrefixLength(CommonJoinOperator<? extends JoinDesc> 
join, TopNKeyOperator topNKeyOperator, int expectedTag, List<ExprNodeDesc> 
keyColumns) {

Review comment:
       nit. Some lines are very long. Please split in multiple lines.
   
   Additionally, can we some comments to these utility methods?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/topnkey/TopNKeyPushdownProcessor.java
##########
@@ -255,6 +260,88 @@ private void pushdownThroughLeftOuterJoin(TopNKeyOperator 
topNKey) throws Semant
     }
   }
 
+  private void pushdownInnerJoin(TopNKeyOperator topNKey, int 
fkJoinInputIndex, boolean nonFkSideIsFiltered) throws SemanticException {

Review comment:
       Can we add a comment to this method? In which cases we can pushdown 
through inner join?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 461405)
    Time Spent: 20m  (was: 10m)

> Pushing TopN Key operator PKFK inner joins
> ------------------------------------------
>
>                 Key: HIVE-23817
>                 URL: https://issues.apache.org/jira/browse/HIVE-23817
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> If there is primary key foreign key relationship between the tables we can 
> push the topnkey operator through the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

Reply via email to