[ 
https://issues.apache.org/jira/browse/PARQUET-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202319#comment-17202319
 ] 

Mukul Sabharwal edited comment on PARQUET-42 at 9/25/20, 5:45 PM:
------------------------------------------------------------------

It would be nice to standardize it. TDigest would also be very useful for 
quantile estimation. It is commutative and associative as well.


was (Author: mjsabby):
It would be nice standardize it. TDigest would also be very useful for quantile 
estimation. It is commutative and associative as well.

> Add HyperLogLog / CountMinSketch to parquet statistics
> ------------------------------------------------------
>
>                 Key: PARQUET-42
>                 URL: https://issues.apache.org/jira/browse/PARQUET-42
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Priority: Minor
>
> HLL and CMS for rowgroups could help with query planning (getting a sense of 
> data skew) and with cheaply counting approximate distinct values. Both are 
> commutative which means they can be combined across rowgroups (unlike an 
> exact distinct count for example).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to