Caizhi Weng created CALCITE-4351:
------------------------------------

             Summary: The result of RelMdUtil#numDistinctVals is incorrect when 
inputs are large
                 Key: CALCITE-4351
                 URL: https://issues.apache.org/jira/browse/CALCITE-4351
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.26.0
            Reporter: Caizhi Weng


Previous implementation of {{RelMdUtil#numDistinctVals}} uses the approximation 
{{ln(1 + x) ~= x}} when {{x}} is small.

However CALCITE-4132 remove this approximation to make the result more 
accurate. This causes the function to calculate an incorrect result for large 
inputs (for example, {{domainSize = 1e18}} and {{numSelected = 1e10}}) due to 
precision problems.

What I would suggest is to treat small and large inputs in different ways. For 
small inputs we use the new, more precise function and for large inputs we use 
the old, approximated function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to