[
https://issues.apache.org/jira/browse/ARROW-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527844#comment-17527844
]
ZMZ91 commented on ARROW-14158:
-------------------------------
Sure. We'd like to have a hash_count_distinct_hll for a proximate result in
many real cases.
> [C++][Compute] Implement count distinct kernel using HyperLogLog
> ----------------------------------------------------------------
>
> Key: ARROW-14158
> URL: https://issues.apache.org/jira/browse/ARROW-14158
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Affects Versions: 7.0.0
> Reporter: Percy Camilo TriveƱo Aucahuasi
> Priority: Major
> Labels: Kernels, kernel
>
> Having a version of the aggregation kernel count distinct using HyperLogLog
> may be useful.
> Note: The implementation should support the merge operator.
> cc [~icook] [~lidavidm]
> Some resources/links:
> [http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf]
> [https://engineering.fb.com/2018/12/13/data-infrastructure/hyperloglog/]
> [https://github.com/facebookincubator/velox/tree/main/velox/aggregates/hyperloglog]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)