[
https://issues.apache.org/jira/browse/IMPALA-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenzhe Zhou resolved IMPALA-8759.
---------------------------------
Resolution: Fixed
The Impala binary was built as release build. The testing was ran with database
table tpch.lineitem which was loaded in scale factor 150. The total number of
rows of the table is 900,035,147. Measured the time for query, like "select
ndv(col_name) from tpch150_parquet.lineitem", from impala-shell.
> Use double precision for HLL
> ----------------------------
>
> Key: IMPALA-8759
> URL: https://issues.apache.org/jira/browse/IMPALA-8759
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 3.2.0
> Reporter: Peter Ebert
> Assignee: Wenzhe Zhou
> Priority: Major
> Labels: perf, ramp-up
>
> For /be/src/exprs/aggregate-functions-ir.cc the finalize function uses a
> float which is only capable of 6-9 digits of precision. More accurate
> estimates for larger cardinalities (beyond 999,999) should be possible with
> double precision. Another c++ implementation uses double as well
> [https://github.com/dialtr/libcount]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]