[ 
https://issues.apache.org/jira/browse/CALCITE-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263515#comment-17263515
 ] 

Julian Hyde commented on CALCITE-4465:
--------------------------------------

I like this approach. 

I suggest that you convert conditions to Sarg before applying this logic. Sargs 
are a much more uniform representation of points, ranges, range sets, and also 
handle nulls. 

I also strongly suggest that you add extra logic if the data type is discrete. 
After “x = 3 or x between 5 and 7”, x has at most 4 values if x is an INTEGER. 
If it is a non-discrete value, say REAL, the NDV is unbounded. 

> Estimate the number of distinct values by filter condition
> ----------------------------------------------------------
>
>                 Key: CALCITE-4465
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4465
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>
> According to our current implementation ({{RelMdDistinctRowCount}}), 
> estimating the number of distinctive values (NDV) does not make good use of 
> the filter condition. It simply forwards the call to its input operator with 
> the fiter condition attached.
> In fact, more information can be obtained for some special but commonly used 
> conditions. For example, given condition {{x = 'a'}}, we can deduce that 
> {{NDV( x ) <= 1}}. Given condition {{x in ('a', 'b')}}, we can deduce that 
> {{NDV( x ) <= 2}}.
> More generally, if we have {{x in ('a', 'b') AND y in ('c', 'd', 'e')}}, we 
> have {{NDV(x, y) <= 2 * 3 = 6}}.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to