Caizhi Weng created CALCITE-4351:
------------------------------------
Summary: The result of RelMdUtil#numDistinctVals is incorrect when
inputs are large
Key: CALCITE-4351
URL: https://issues.apache.org/jira/browse/CALCITE-4351
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.26.0
Reporter: Caizhi Weng
Previous implementation of {{RelMdUtil#numDistinctVals}} uses the approximation
{{ln(1 + x) ~= x}} when {{x}} is small.
However CALCITE-4132 remove this approximation to make the result more
accurate. This causes the function to calculate an incorrect result for large
inputs (for example, {{domainSize = 1e18}} and {{numSelected = 1e10}}) due to
precision problems.
What I would suggest is to treat small and large inputs in different ways. For
small inputs we use the new, more precise function and for large inputs we use
the old, approximated function.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)