>From <[email protected]>:
[email protected] has uploaded this change for review. (
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19307 )
Change subject: [ASTERIXDB-3546][COMP] Use actual distinct estimate from sample
if table size is small
......................................................................
[ASTERIXDB-3546][COMP] Use actual distinct estimate from sample if table size
is small
Change-Id: I2894a6daf76573d22541a13066e495c9d286b3ae
---
M
asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
1 file changed, 24 insertions(+), 1 deletion(-)
git pull ssh://asterix-gerrit.ics.uci.edu:29418/asterixdb
refs/changes/07/19307/1
diff --git
a/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
b/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
index 66f0ad5..3d97469 100644
---
a/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
+++
b/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules/cbo/Stats.java
@@ -195,7 +195,21 @@
}
double estDistinctCardinalityFromSample =
findPredicateCardinality(result, true);
- double numDistincts =
distinctEstimator2(estDistinctCardinalityFromSample, index);
+ Index.SampleIndexDetails details = (Index.SampleIndexDetails)
index.getIndexDetails();
+ double numDistincts;
+ // if the table is smaller than the sample size, there is no need
to use the estimator
+ //
getSampleCardinalityTarget() equals 1063 or 4252 or 17008
+ if (details.getSourceCardinality() <=
details.getSampleCardinalityTarget()) {
+ numDistincts = estDistinctCardinalityFromSample;
+ } else { // when the number of distincts is smaller than approx
25% of the sample size, then we do not
+ // then we do not need to call the estimator. This is
a good heuristic. This was obtained by looking at the graph
+ // of d = D ( 1 - e^(-getSampleCardinalityTarget/D) ; d =
estDistinctCardinalityFromSample; D = actual number of distincts
+ if (estDistinctCardinalityFromSample <= 0.25 *
details.getSampleCardinalityTarget()) {
+ numDistincts = estDistinctCardinalityFromSample;
+ } else {
+ numDistincts =
distinctEstimator2(estDistinctCardinalityFromSample, index);
+ }
+ }
return 1.0 / numDistincts; // this is the expected selectivity for
joins.
}
}
--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/19307
To unsubscribe, or for help writing mail filters, visit
https://asterix-gerrit.ics.uci.edu/settings
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I2894a6daf76573d22541a13066e495c9d286b3ae
Gerrit-Change-Number: 19307
Gerrit-PatchSet: 1
Gerrit-Owner: [email protected]
Gerrit-MessageType: newchange