>From <[email protected]>:

[email protected] has uploaded this change for review. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19307 )


Change subject: [ASTERIXDB-3546][COMP] Use actual distinct estimate from sample 
if table size is small
......................................................................

[ASTERIXDB-3546][COMP] Use actual distinct estimate from sample if table size 
is small

Change-Id: I2894a6daf76573d22541a13066e495c9d286b3ae
---
M 
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
1 file changed, 24 insertions(+), 1 deletion(-)



  git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb 
refs/changes/07/19307/1

diff --git 
a/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
 
b/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
index 66f0ad5..3d97469 100644
--- 
a/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
+++ 
b/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
@@ -195,7 +195,21 @@
             }

             double estDistinctCardinalityFromSample = 
findPredicateCardinality(result, true);
-            double numDistincts = 
distinctEstimator2(estDistinctCardinalityFromSample, index);
+            Index.SampleIndexDetails details = (Index.SampleIndexDetails) 
index.getIndexDetails();
+            double numDistincts;
+            // if the table is smaller than the sample size, there is no need 
to use the estimator
+            //                                            
getSampleCardinalityTarget() equals 1063 or 4252 or 17008
+            if (details.getSourceCardinality() <= 
details.getSampleCardinalityTarget()) {
+                numDistincts = estDistinctCardinalityFromSample;
+            } else { // when the number of distincts is smaller than approx 
25% of the sample size, then we do not
+                         // then we do not need to call the estimator. This is 
a good heuristic. This was obtained by looking at the graph
+                     // of d = D ( 1 - e^(-getSampleCardinalityTarget/D) ; d = 
estDistinctCardinalityFromSample; D = actual number of distincts
+                if (estDistinctCardinalityFromSample <= 0.25 * 
details.getSampleCardinalityTarget()) {
+                    numDistincts = estDistinctCardinalityFromSample;
+                } else {
+                    numDistincts = 
distinctEstimator2(estDistinctCardinalityFromSample, index);
+                }
+            }
             return 1.0 / numDistincts; // this is the expected selectivity for 
joins.
         }
     }

--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19307
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I2894a6daf76573d22541a13066e495c9d286b3ae
Gerrit-Change-Number: 19307
Gerrit-PatchSet: 1
Gerrit-Owner: [email protected]
Gerrit-MessageType: newchange

Reply via email to