[
https://issues.apache.org/jira/browse/IMPALA-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860283#comment-17860283
]
ASF subversion and git services commented on IMPALA-8042:
---------------------------------------------------------
Commit 101e10ba3189db0e115cfb98bb8fe7ac1b108186 in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=101e10ba3 ]
IMPALA-6311: Lower max_filter_error_rate to 10%
Recent changes such as IMPALA-11924 and IMPALA-8042 managed make NDV
estimate more accurate in some cases. However, the more
accurate (smaller) NDV estimates after these changes have exacerbated
the problem with the 75% default FPP, which causes more cases of badly
undersized filters.
This patch lower default value of max_filter_error_rate flag from 75% to
10%. Lower target FPP will result in doubling runtime filter size most
of the time when previous FPP is greater than 10%.
Testing:
- Pass exhaustive tests.
- Manually ran a TPC-DS test at 3 TB comparing 10% to 75%. A value of
10% improves q94 by 2x and q95 by 5x, improves total query time and
geomean time by a few percent, and doesn't cause a significant (> 10%)
regression in any individual query.
Change-Id: I4104e65cc3ce0ef4b36f6420f5044f2cdba9de04
Reviewed-on: http://gerrit.cloudera.org:8080/21552
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Better selectivity estimate for BETWEEN
> ---------------------------------------
>
> Key: IMPALA-8042
> URL: https://issues.apache.org/jira/browse/IMPALA-8042
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Assignee: Riza Suminto
> Priority: Minor
> Fix For: Impala 4.5.0
>
>
> The analyzer rewrites a BETWEEN expression into a pair of inequalities.
> IMPALA-8037 explains that the planner then groups all such non-quality
> conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains
> that the analyzer should handle inequalities better.
> BETWEEN is a special case and informs the final result. If we assume a
> selectivity of s for inequality, then BETWEEN should be something like s/2.
> The intuition is that if c >= x includes, say, ⅓ of values, and c <= y
> includes a third of values, then c BETWEEN x AND y should be a narrower set
> of values, say ⅙.
> [Ramakrishnan an
> Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\
> recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the
> general expression x <= c AND c <= Y. Note the discrepancy between the
> compound inequality case and the BETWEEN case, likely reflecting the
> additional information we obtain when the user chooses to use BETWEEN.
> To implement a special BETWEEN selectivity in Impala, we must remember the
> selectivity of BETWEEN during the rewrite to a compound inequality.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]