[
https://issues.apache.org/jira/browse/PARQUET-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202319#comment-17202319
]
Mukul Sabharwal edited comment on PARQUET-42 at 9/25/20, 5:45 PM:
------------------------------------------------------------------
It would be nice to standardize it. TDigest would also be very useful for
quantile estimation. It is commutative and associative as well.
was (Author: mjsabby):
It would be nice standardize it. TDigest would also be very useful for quantile
estimation. It is commutative and associative as well.
> Add HyperLogLog / CountMinSketch to parquet statistics
> ------------------------------------------------------
>
> Key: PARQUET-42
> URL: https://issues.apache.org/jira/browse/PARQUET-42
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Alex Levenson
> Priority: Minor
>
> HLL and CMS for rowgroups could help with query planning (getting a sense of
> data skew) and with cheaply counting approximate distinct values. Both are
> commutative which means they can be combined across rowgroups (unlike an
> exact distinct count for example).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)