[ 
https://issues.apache.org/jira/browse/CALCITE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161019#comment-17161019
 ] 

Stamatis Zampetakis commented on CALCITE-4132:
----------------------------------------------

Thanks for working on this [~fan_li_ya]. I had a look in the PR and I like the 
detailed explanation of deriving NDV in the PR. Nevertheless, with or without 
the approximation (N approaches infinity) the NDV remains an estimate. 

If I understand well your proposal you use the same method and assumptions as 
before but you removed the approximation for large N. Do you have some concrete 
examples (queries, plans, datasets) where the approximation causes problems? As 
far as I can see there are no plan differences in Calcite so I guess your 
motivation for this change comes from downstream projects.

> Estimate the NDV accurately
> ---------------------------
>
>                 Key: CALCITE-4132
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4132
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we estimate the NDV of many operators based on the 
> RelMdUtil#numDistinctVals method. This method estimates the expected number 
> of distinct values selected n times (with replacement) from a collection with 
> N distinct values. The estimation is based on the approximation when N 
> approaches infinity.
> However, when N is not a large number, the difference between the approximate 
> and exact values can be notabe. In addtion, the error can be magnified by 
> different combinations of N and n, which can lead the optimizer to make wrong 
> decisions. 
> Therefore, we give the exact estimation based on the unbiased estimator (The 
> proof is given in the comment). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to