[
https://issues.apache.org/jira/browse/DRILL-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810357#comment-16810357
]
ASF GitHub Bot commented on DRILL-7119:
---------------------------------------
gparai commented on pull request #1733: DRILL-7119: Compute range predicate
selectivity using histograms.
URL: https://github.com/apache/drill/pull/1733#discussion_r272391854
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/NumericEquiDepthHistogram.java
##########
@@ -69,27 +79,177 @@ public void setNumRowsPerBucket(long numRows) {
}
@Override
- public Double estimatedSelectivity(RexNode filter) {
+ public Double estimatedSelectivity(final RexNode filter) {
if (numRowsPerBucket >= 0) {
- return 1.0;
- } else {
- return null;
+ // at a minimum, the histogram should have a start and end point of 1
bucket, so at least 2 entries
+ Preconditions.checkArgument(buckets.length >= 2, "Histogram has invalid
number of entries");
+ final int first = 0;
+ final int last = buckets.length - 1;
+
+ // number of buckets is 1 less than the total # entries in the buckets
array since last
+ // entry is the end point of the last bucket
+ final int numBuckets = buckets.length - 1;
+ final long totalRows = numBuckets * numRowsPerBucket;
+ if (filter instanceof RexCall) {
+ // get the operator
+ SqlOperator op = ((RexCall) filter).getOperator();
+ if (op.getKind() == SqlKind.GREATER_THAN ||
+ op.getKind() == SqlKind.GREATER_THAN_OR_EQUAL) {
+ Double value = getLiteralValue(filter);
+ if (value != null) {
+
+ // *** Handle the boundary conditions first ***
+
+ // if value is less than or equal to the first bucket's start
point then all rows qualify
+ int result = value.compareTo(buckets[first]);
+ if (result <= 0) {
+ return LARGE_SELECTIVITY;
+ }
+ // if value is greater than the end point of the last bucket, then
none of the rows qualify
+ result = value.compareTo(buckets[last]);
+ if (result > 0) {
+ return SMALL_SELECTIVITY;
Review comment:
For posterity, please add an explanation for SMALL_SELECTIVITY > 0 and the
meaning of buckets[n], if not added elsewhere.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Modify selectivity calculations to use histograms
> -------------------------------------------------
>
> Key: DRILL-7119
> URL: https://issues.apache.org/jira/browse/DRILL-7119
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Query Planning & Optimization
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
> Fix For: 1.16.0
>
>
> (Please see parent JIRA for the design document)
> Once the t-digest based histogram is created, we need to read it back and
> modify the selectivity calculations such that they use the histogram buckets
> for range conditions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)