[ 
https://issues.apache.org/jira/browse/CALCITE-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107842#comment-16107842
 ] 

Julian Hyde edited comment on CALCITE-1588 at 7/31/17 7:40 PM:
---------------------------------------------------------------

As [~gian] points out, Oracle, BigQuery and MemSQL support 
{{APPROX_COUNT_DISTINCT}}. I also see it in VoltDB.

A quick survey of other databases:
* Vertica has {{APPROXIMATE_COUNT_DISTINCT}}
* Redshift has {{[ APPROXIMATE ] COUNT ( [ DISTINCT | ALL ] * | expression )}}.
* In PostgreSQL you can bolt on your own hyperloglog function, but there 
doesn't seem to have a unified approach.
* I don't see anything in DB2 or MySQL

I think that is a sufficient de facto standard to support 
{{APPROX_COUNT_DISTINCT}} in Calcite.

We should also support an APPROXIMATE clause (for an aggregate function and for 
SELECT). {{APPROX_COUNT_DISTINCT\(x)}} would be syntactic sugar for 
{{COUNT(DISTINCT x) APPROXIMATE ()}}.

I propose that we do {{APPROX_COUNT_DISTINCT}} first; don't yet add parser 
support for {{APPROXIMATE}}, but do add an {{approximate}} field to 
{{AggregateCall}}.


was (Author: julianhyde):
As [~gian] points out, Oracle, BigQuery and MemSQL support 
{{APPROX_COUNT_DISTINCT}}. I also see it in VoltDB.

A quick survey of other databases:
* Vertica has {{APPROXIMATE_COUNT_DISTINCT}}
* Redshift has {{[ APPROXIMATE ] COUNT ( [ DISTINCT | ALL ] * | expression )}}.
* In PostgreSQL you can bolt on your own hyperloglog function, but there 
doesn't seem to have a unified approach.
* I don't see anything in DB2 or MySQL

I think that is a sufficient de facto standard to support 
{{APPROX_COUNT_DISTINCT}} in Calcite.

We should also support an APPROXIMATE clause (for an aggregate function and for 
SELECT). {{APPROX_COUNT_DISTINCT(x)}} would be syntactic sugar for 
{{COUNT(DISTINCT x) APPROXIMATE ()}}.

I propose that we do {{APPROX_COUNT_DISTINCT}} first; don't yet add parser 
support for {{APPROXIMATE}}, but do add an {{approximate}} field to 
{{AggregateCall}}.

> Add SQL syntax to allow approximate LIMIT and distinct-COUNT
> ------------------------------------------------------------
>
>                 Key: CALCITE-1588
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1588
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> Add SQL syntax to allow approximate LIMIT and distinct-COUNT. These will set 
> the properties specified in CALCITE-1587. By default the properties are 
> false, so the query will return exact results.
> Exact syntax is to be decided. It could be at the top of the query (therefore 
> affecting every LIMIT or aggregate in the query) or it could be more 
> localized (e.g. {{COUNT(DISTINCT customerId) APPROXIMATE (WITHIN 10 
> PERCENT)}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to