[ 
https://issues.apache.org/jira/browse/DRILL-7187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830434#comment-16830434
 ] 

ASF GitHub Bot commented on DRILL-7187:
---------------------------------------

amansinha100 commented on pull request #1772: DRILL-7187: Improve selectivity 
estimation of BETWEEN predicates and …
URL: https://github.com/apache/drill/pull/1772#discussion_r279828965
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdSelectivity.java
 ##########
 @@ -356,8 +426,8 @@ private boolean isMultiColumnPredicate(final RexNode node) 
{
     return findAllRexInputRefs(node).size() > 1;
   }
 
-  private static List<RexInputRef> findAllRexInputRefs(final RexNode node) {
-      List<RexInputRef> rexRefs = new ArrayList<>();
+  private static Set<RexInputRef> findAllRexInputRefs(final RexNode node) {
 
 Review comment:
   Yes, thanks for pointing that out, even though that predicate $0=$0 is 
unexpected (something to investigate in future).  I have reverted this change 
and it returns a List as before.  Instead, now where the original call to 
`isMultiColumnPredicate()` happens I added a second condition (line 182) that 
ensures that conditions of type `$1 > 10 AND $1 < 20` which are created after 
calling `preProcessRangeConditions()` are not considered the same as 
multicolumn predicates.  
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Improve selectivity estimates for range predicates when using histogram
> -----------------------------------------------------------------------
>
>                 Key: DRILL-7187
>                 URL: https://issues.apache.org/jira/browse/DRILL-7187
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: 1.17.0
>
>
> 2 types of selectivity estimation improvements need to be done:
> 1.  For range predicates on the same column, we need to collect all such 
> predicates in 1 group and do a histogram lookup for them together. 
> For instance: 
> {noformat}
>  WHERE a > 10 AND b < 20 AND c = 100 AND a <= 50 AND b < 50
> {noformat}
>  Currently, the Drill behavior is to treat each of the conjuncts 
> independently and multiply the individual selectivities.  However, that will 
> not give the accurate estimates. Here, we want to group the predicates on 'a' 
> together and do a single lookup.  Similarly for 'b'.  
> 2. NULLs are not maintained by the histogram but when doing the selectivity 
> calculations, the histogram should use the totalRowCount as the denominator 
> rather than the non-null count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to