[
https://issues.apache.org/jira/browse/IMPALA-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Qifan Chen resolved IMPALA-2658.
--------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> Extend the NDV function to accept a precision
> ---------------------------------------------
>
> Key: IMPALA-2658
> URL: https://issues.apache.org/jira/browse/IMPALA-2658
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.2.4
> Reporter: Peter Ebert
> Assignee: Qifan Chen
> Priority: Minor
> Labels: ramp-up
> Fix For: Impala 4.0
>
> Attachments: Comparison of HLL Memory usage, Query Duration and
> Accuracy.jpg
>
>
> Hyperloglog algorithm used by NDV defaults to a precision of 10. Being able
> to set this precision would have two benefits:
> # Lower precision sizes can speed up the performance, as a precision of 9 has
> 1/2 the number of registers as 10 (exponential) and may be just as accurate
> depending on expected cardinality.
> # Higher precision can help with very large cardinalities (100 million to
> billion range) and will typically provide more accurate data. Those who are
> presenting estimates to end users will likely be willing to trade some
> performance cost for more accuracy, while still out performing the naive
> approach by a large margin.
> Propose adding the overloaded function NDV(expression, int precision)
> with accepted range between 18 and 4 inclusive.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)