[ 
https://issues.apache.org/jira/browse/CALCITE-4351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651655#comment-17651655
 ] 

Liya Fan commented on CALCITE-4351:
-----------------------------------

One method that comes to my mind is to get the value of `ln(1-1/d)` through 
Taylor expansion. In particular, we have `lnx = (x-1) - (x-1)^2/2 + (x-1)^3 / 3 
- (x-1)^4/4 + ...`. By applying x = 1 - 1/d, we can have `ln(1-1/d) = -1/d - 
1/(2*d^2) - 1/(3*d^3) - 1/(4*d^4) - ...` (Please correct me if I am wrong) The 
series should converge to 0 very fast for large domain size. 

> RelMdUtil#numDistinctVals always returns 0 for large inputs
> -----------------------------------------------------------
>
>                 Key: CALCITE-4351
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4351
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.26.0
>            Reporter: Caizhi Weng
>            Priority: Major
>
> Previous implementation of {{RelMdUtil#numDistinctVals}} uses the 
> approximation {{ln(1 + x) ~= x}} when {{x}} is small.
> However CALCITE-4132 remove this approximation to make the result more 
> accurate. This causes the function to calculate an incorrect result for large 
> inputs (for example, when {{domainSize = 1e18}} and {{numSelected = 1e10}} 
> the result is 0) due to precision problems.
> What I would suggest is to treat small and large inputs in different ways. 
> For small inputs we use the new, more precise function and for large inputs 
> we use the old, approximated function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to