[jira] [Work logged] (HIVE-19653) Incorrect predicate pushdown for groupby with grouping sets

ASF GitHub Bot (Jira) Tue, 09 Jun 2020 09:10:16 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-19653?focusedWorklogId=442887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442887
 ]


ASF GitHub Bot logged work on HIVE-19653:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Jun/20 16:09
            Start Date: 09/Jun/20 16:09
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #1070:
URL: https://github.com/apache/hive/pull/1070#discussion_r437021422



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -733,6 +736,97 @@ private void applyFilterTransitivity(JoinOperator join, 
int targetPos, OpWalkerI
     }
   }
 
+  public static class GroupByPPD extends DefaultPPD implements 
SemanticNodeProcessor {
+
+    @Override
+    public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
+        Object... nodeOutputs) throws SemanticException {
+      super.process(nd, stack, procCtx, nodeOutputs);
+      OpWalkerInfo owi = (OpWalkerInfo) procCtx;
+      GroupByDesc groupByDesc = ((GroupByOperator)nd).getConf();
+      ExprWalkerInfo prunedPred = owi.getPrunedPreds((Operator<? extends 
OperatorDesc>) nd);
+      if (prunedPred == null || !prunedPred.hasAnyCandidates() ||
+          !groupByDesc.isGroupingSetsPresent()) {
+        return null;
+      }
+
+      List<Long> groupingSets = groupByDesc.getListGroupingSets();
+      Map<String, List<ExprNodeDesc>> candidates = 
prunedPred.getFinalCandidates();
+      FastBitSet[] fastBitSets = new FastBitSet[groupingSets.size()];
+      int groupingSetPosition = groupByDesc.getGroupingSetPosition();
+      for (int pos = 0; pos < fastBitSets.length; pos ++) {
+        fastBitSets[pos] = 
GroupByOperator.groupingSet2BitSet(groupingSets.get(pos),
+            groupingSetPosition);
+      }
+      List<ExprNodeDesc> groupByKeys = 
((GroupByOperator)nd).getConf().getKeys();
+      Map<ExprNodeDesc, ExprNodeDesc> newToOldExprMap = 
prunedPred.getNewToOldExprMap();
+      Map<String, List<ExprNodeDesc>> nonFinalCandidates = new HashMap<String, 
List<ExprNodeDesc>>();
+      for (Iterator<Map.Entry<String, List<ExprNodeDesc>>>

Review comment:
       Please use a `while`.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
##########
@@ -733,6 +736,97 @@ private void applyFilterTransitivity(JoinOperator join, 
int targetPos, OpWalkerI
     }
   }
 
+  public static class GroupByPPD extends DefaultPPD implements 
SemanticNodeProcessor {
+
+    @Override
+    public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx,
+        Object... nodeOutputs) throws SemanticException {
+      super.process(nd, stack, procCtx, nodeOutputs);
+      OpWalkerInfo owi = (OpWalkerInfo) procCtx;
+      GroupByDesc groupByDesc = ((GroupByOperator)nd).getConf();
+      ExprWalkerInfo prunedPred = owi.getPrunedPreds((Operator<? extends 
OperatorDesc>) nd);
+      if (prunedPred == null || !prunedPred.hasAnyCandidates() ||
+          !groupByDesc.isGroupingSetsPresent()) {
+        return null;
+      }
+
+      List<Long> groupingSets = groupByDesc.getListGroupingSets();
+      Map<String, List<ExprNodeDesc>> candidates = 
prunedPred.getFinalCandidates();
+      FastBitSet[] fastBitSets = new FastBitSet[groupingSets.size()];
+      int groupingSetPosition = groupByDesc.getGroupingSetPosition();
+      for (int pos = 0; pos < fastBitSets.length; pos ++) {
+        fastBitSets[pos] = 
GroupByOperator.groupingSet2BitSet(groupingSets.get(pos),
+            groupingSetPosition);
+      }
+      List<ExprNodeDesc> groupByKeys = 
((GroupByOperator)nd).getConf().getKeys();
+      Map<ExprNodeDesc, ExprNodeDesc> newToOldExprMap = 
prunedPred.getNewToOldExprMap();
+      Map<String, List<ExprNodeDesc>> nonFinalCandidates = new HashMap<String, 
List<ExprNodeDesc>>();
+      for (Iterator<Map.Entry<String, List<ExprNodeDesc>>>
+           iter = candidates.entrySet().iterator(); iter.hasNext(); ) {
+        Map.Entry<String, List<ExprNodeDesc>> entry = iter.next();
+        List<ExprNodeDesc> residualExprs = new ArrayList<ExprNodeDesc>();
+        List<ExprNodeDesc> finalCandidates = new ArrayList<ExprNodeDesc>();
+        List<ExprNodeDesc> exprs = entry.getValue();
+        for (ExprNodeDesc expr : exprs) {
+          if (canPredPushdown(expr, groupByKeys, fastBitSets, 
groupingSetPosition)) {
+            finalCandidates.add(expr);
+          } else {
+            residualExprs.add(newToOldExprMap.get(expr));
+          }
+        }
+        if (!residualExprs.isEmpty()) {
+          nonFinalCandidates.put(entry.getKey(), residualExprs);
+        }
+
+        if (finalCandidates.isEmpty()) {
+          iter.remove();
+        } else {
+          exprs.clear();
+          exprs.addAll(finalCandidates);
+        }
+      }
+      
+      if (!nonFinalCandidates.isEmpty()) {
+        createFilter((Operator) nd, nonFinalCandidates, owi);
+      }
+      return null;
+    }
+
+    private boolean canPredPushdown(ExprNodeDesc expr, List<ExprNodeDesc> 
groupByKeys,
+        FastBitSet[] bitSets, int groupingSetPosition) {
+      List<ExprNodeDesc> columns = new ArrayList<ExprNodeDesc>();
+      extractCols(expr, columns);
+      for (ExprNodeDesc col : columns) {
+        int index = groupByKeys.indexOf(col);
+        assert index >= 0;
+        for (FastBitSet bitset : bitSets) {
+          int keyPos = bitset.nextClearBit(0);
+          for (; keyPos < groupingSetPosition && keyPos != index;

Review comment:
       Please use a `while`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 442887)
    Time Spent: 1h 10m  (was: 1h)

> Incorrect predicate pushdown for groupby with grouping sets
> -----------------------------------------------------------
>
>                 Key: HIVE-19653
>                 URL: https://issues.apache.org/jira/browse/HIVE-19653
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer
>            Reporter: Zhang Li
>            Assignee: Zhihua Deng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-19653.1.patch, HIVE-19653.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Consider the following query:
> {code:java}
> CREATE TABLE T1(a STRING, b STRING, s BIGINT);
> INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456);
> SELECT * FROM (
> SELECT a, b, sum(s)
> FROM T1
> GROUP BY a, b GROUPING SETS ((), (a), (b), (a, b))
> ) t WHERE a IS NOT NULL;
> {code}
> When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will 
> output:
> {code:java}
> NULL  NULL    123456
> NULL  bbbb    123456
> aaaa  NULL    123456
> aaaa  bbbb    123456
> {code}
> We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect.
> When performing PPD optimization for a GBY operator, we should make sure all 
> grouping sets contains the processing expr before pushdown. otherwise the 
> expr value after GBY is changed and the result is wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-19653) Incorrect predicate pushdown for groupby with grouping sets

Reply via email to