[
https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046001#comment-17046001
]
ASF subversion and git services commented on IMPALA-8759:
---------------------------------------------------------
Commit 322483987600fb3cf84da21a47cadccb6989820b in impala's branch
refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3224839 ]
IMPALA-8759: Use double precision for HLL finalize function
Current HLL finalize function use single precision of data type
float32 to calculate estimate. It's not accurate for the larger
cardinalities beyond 1,000,000 since float32 only has 6~7 decimal
digit precision.
This patch change single precision data type to double precision
type for HLL finalize function.
Testing:
- Passed all exhaustive tests.
- Did benchmark for queries with NDV functions. The performance
impact is negligible.
See following spreadsheet for the menchmark:
https://docs.google.com/spreadsheets/d/1DIVOEs5C4MJL1b7O4MA_jkaM3Y-JSMFREjXCUHJ3eHc/edit#gid=0
Change-Id: I0c5a5229b682070b0bc14da287db5231159dbb3d
Reviewed-on: http://gerrit.cloudera.org:8080/15167
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Use double precision for HLL
> ----------------------------
>
> Key: IMPALA-8759
> URL: https://issues.apache.org/jira/browse/IMPALA-8759
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 3.2.0
> Reporter: Peter Ebert
> Assignee: Wenzhe Zhou
> Priority: Major
> Labels: perf, ramp-up
> Fix For: Impala 3.4.0
>
>
> For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a
> float which is only capable of 6-9 digits of precision. More accurate
> estimates for larger cardinalities (beyond 999,999) should be possible with
> double precision. Another c++ implementation uses double as well
> [https://github.com/dialtr/libcount]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]