[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

ASF GitHub Bot (JIRA) Wed, 06 Sep 2017 08:08:15 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155500#comment-16155500
 ]


ASF GitHub Bot commented on DRILL-5691:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/889#discussion_r137291711
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
 ---
    @@ -203,35 +203,27 @@ public static void 
addLeastRestrictiveCasts(LogicalExpression[] leftExpressions,
       }
     
       /**
    -   * Utility method to check if a subquery (represented by its root 
RelNode) is provably scalar. Currently
    -   * only aggregates with no group-by are considered scalar. In the 
future, this method should be generalized
    -   * to include more cases and reconciled with Calcite's notion of scalar.
    +   * Utility method to check if a subquery (represented by its root 
RelNode) is provably scalar.
        * @param root The root RelNode to be examined
        * @return True if the root rel or its descendant is scalar, False 
otherwise
        */
       public static boolean isScalarSubquery(RelNode root) {
    -    DrillAggregateRel agg = null;
    -    RelNode currentrel = root;
    -    while (agg == null && currentrel != null) {
    -      if (currentrel instanceof DrillAggregateRel) {
    -        agg = (DrillAggregateRel)currentrel;
    -      } else if (currentrel instanceof RelSubset) {
    -        currentrel = ((RelSubset)currentrel).getBest() ;
    -      } else if (currentrel.getInputs().size() == 1) {
    -        // If the rel is not an aggregate or RelSubset, but is a 
single-input rel (could be Project,
    -        // Filter, Sort etc.), check its input
    -        currentrel = currentrel.getInput(0);
    -      } else {
    -        break;
    -      }
    -    }
    -
    -    if (agg != null) {
    -      if (agg.getGroupSet().isEmpty()) {
    -        return true;
    +    RelMetadataQuery relMetadataQuery = RelMetadataQuery.instance();
    +    RelNode currentRel = root;
    +    for (; ; ) {
    +      if (currentRel instanceof RelSubset) {
    +        currentRel = ((RelSubset) currentRel).getBest();
    +      } else if (currentRel != null) {
    +        Double rowCount = relMetadataQuery.getRowCount(currentRel);
    --- End diff --
    
    getRowCount() is not correct.  Pls see my prior comment on using the 
RelMdMaxRowCount.getMaxRowCount()  APIs.   The reason is getRowCount() will 
give an estimate which may not match the actual run-time value, whereas the 
getMaxRowCount() is an assertion by the optimizer that row count cannot exceed 
a number N (in your case 1).  


> multiple count distinct query planning error at physical phase 
> ---------------------------------------------------------------
>
>                 Key: DRILL-5691
>                 URL: https://issues.apache.org/jira/browse/DRILL-5691
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin 
> rule to translate the (Aggregate、Aggregate、Project、Scan) or 
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase. 
> Then ,once user issue count distinct queries , it will be translated to query 
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b)  from t where dt=xx " 
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t 
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be 
> wrong at the physical phase.The error info is here: 
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269. 
> eg3 will also get the similar error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5691) multiple count distinct query planning error at physical phase

Reply via email to