[ 
https://issues.apache.org/jira/browse/PHOENIX-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062445#comment-16062445
 ] 

Julian Hyde commented on PHOENIX-3390:
--------------------------------------

[~gjacoby], Thanks for the heads up. Even more pertinent than my data profiling 
work (which is about the problem of computing 2^n approximate distinct-counts 
simultaneously, see CALCITE-1616) is people's requirement to have fast, 
approximate distinct count. Druid supports various sketches, and we wish to 
surface them in Calcite's Druid adapter (see CALCITE-1787 theta-sketch, 
CALCITE-1587 top-N, CALCITE-1853 knowing when approximate count-distinct is 
acceptable).

Today many databases have a syntax for approximate aggregates, and 
unfortunately the syntaxes are rarely the same and are often too closely 
coupled to a particular algorithm (e.g HyperLogLog). I have logged CALCITE-1588 
to introduce an {{APPROXIMATE}} clause, e.g. {{COUNT(DISTINCT customerId) 
APPROXIMATE (WITHIN 10 PERCENT))}}. It would be great if Phoenix wants to go 
with that syntax.

> Custom UDAF for HyperLogLogPlus
> -------------------------------
>
>                 Key: PHOENIX-3390
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3390
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Swapna Kasula
>            Assignee: Ethan Wang
>            Priority: Minor
>
> With ref # PHOENIX-2069
> Custome UDAF to aggregate/union of Hyperloglog's of a column and returns a 
> Hyperloglog.
> select hllUnion(col1) from table;  //returns a Hyperloglog, which is the 
> union of all hyperloglog's from all rows for column 'col1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to