[
https://issues.apache.org/jira/browse/CALCITE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263708#comment-17263708
]
Julian Hyde commented on CALCITE-4465:
--------------------------------------
An implementation note. I'd focus on the single-column case. That is, given the
expression "x in ('a', 'b') AND y in ('c', 'd', 'e')", focus on "how many
distinct values for x?" and "how many distinct values for y?". You can then
compose those results by multiplication.
If someone asks "what are the distinct values for (c1, c2, c3, c4, c5)" find
the unique keys. If for example you know that c1 is a key and (c2, c3) is a
composite key, then the NDV is {{least (ndv(c1), ndv(c2, c3), row count)}}.
> Estimate the number of distinct values by filter condition
> ----------------------------------------------------------
>
> Key: CALCITE-4465
> URL: https://issues.apache.org/jira/browse/CALCITE-4465
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Reporter: Liya Fan
> Assignee: Liya Fan
> Priority: Major
>
> According to our current implementation ({{RelMdDistinctRowCount}}),
> estimating the number of distinctive values (NDV) does not make good use of
> the filter condition. It simply forwards the call to its input operator with
> the fiter condition attached.
> In fact, more information can be obtained for some special but commonly used
> conditions. For example, given condition {{x = 'a'}}, we can deduce that
> {{NDV( x ) <= 1}}. Given condition {{x in ('a', 'b')}}, we can deduce that
> {{NDV( x ) <= 2}}.
> More generally, if we have {{x in ('a', 'b') AND y in ('c', 'd', 'e')}}, we
> have {{NDV(x, y) <= 2 * 3 = 6}}.
> Thoughts?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)