Re: ElasticSearch Adapter. converting APPROX_COUNT_DISTINCT into Elastic cardinality

Julian Hyde Tue, 22 Jan 2019 15:50:30 -0800

Ah, I’d forgotten about Table.rolledUpColumnValidInsideAgg (for 
https://issues.apache.org/jira/browse/CALCITE-1787 
<https://issues.apache.org/jira/browse/CALCITE-1787>). Yes, Elastic doesn’t 
have to deal with that complexity.


Since you throw for non-approximate distinct-count, I think it will be fine. 
You COULD keep AggregateExpandDistinctAggregatesRule around to handle such 
aggregates but I think your users would prefer that you throw.

Julian



> On Jan 22, 2019, at 3:30 PM, Andrei Sereda <[email protected]> wrote:
> 
> …. require similar things from Calcite; if so, the code should be similar,
> right?
> 
> Druid uses Table.rolledUpColumnValidInsideAgg function instead of
> explicitly removing AggregateExpandDistinctAggregatesRule plus a few custom
> rules in DruidRules
> <https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidRules.java>
> .
> 
> In your PR, what happens if someone executes a non-approximate
> COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a
> non-distinct COUNT. That doesn’t seem right.
> 
> Unfortunately elastic doesn’t have precise distinct count (see Support for
> precise cardinality aggregation
> <https://github.com/elastic/elasticsearch/issues/15876>). Currently I’m
> throwing InvalidRelException for non-approximate distinct counts
> <https://github.com/apache/calcite/pull/1008/files#diff-80f1c109f5b5c10a136181eb650ff454R75>
> :
> 
> if (aggCall.isDistinct() && !aggCall.isApproximate()) {
>  final String message = String.format(Locale.ROOT, "Only approximate distinct 
> "
>      + "aggregations are supported in Elastic (cardinality
> aggregation). Use %s function",
>      SqlStdOperatorTable.APPROX_COUNT_DISTINCT.getName());
>  throw new InvalidRelException(message);
> }
> 
> Should (Can) it be handled differently ?
> 
> On Tue, Jan 22, 2019 at 5:44 PM Julian Hyde <[email protected]> wrote:
> 
>> Yes, I think it makes sense to disable
>> AggregateExpandDistinctAggregatesRule when optimizing for Elastic.
>> 
>> I’m surprised that Druid’s logic is too complicated. If believe that
>> Druid’s capabilities for approximate distinct-count are similar to Elastic,
>> and I believe they require similar things from Calcite; if so, the code
>> should be similar, right?
>> 
>> In your PR, what happens if someone executes a non-approximate
>> COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a
>> non-distinct COUNT. That doesn’t seem right.
>> 
>> Julian
>> 
>> 
>> 
>>> On Jan 22, 2019, at 1:22 PM, Andrei Sereda <[email protected]> wrote:
>>> 
>>> Does it make sense to remove this rule from the planner during the
>>> optimization of queries to Elastic?
>>> 
>>> Thanks for the hint, Stamatis. I did remove
>>> AggregateExpandDistinctAggregatesRule from elastic planner and things are
>>> working.
>>> For more info see PR-1008 <https://github.com/apache/calcite/pull/1008>.
>>> 
>>> On Mon, Jan 21, 2019 at 3:24 AM Stamatis Zampetakis <[email protected]>
>>> wrote:
>>> 
>>>> Hi Andrei,
>>>> 
>>>> From what you say it seems that if AggregateExpandDistinctAggregatesRule
>>>> was not applied you wouldn't have a problem translating this to Elastic.
>>>> Does it make sense to remove this rule from the planner during the
>>>> optimization of queries to Elastic?
>>>> 
>>>> Best,
>>>> Stamatis
>>>> 
>>>> Στις Σάβ, 19 Ιαν 2019 στις 1:35 π.μ., ο/η Andrei Sereda
>> <[email protected]>
>>>> έγραψε:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into
>> elastic
>>>>> as cardinality
>>>>> <
>>>>> 
>>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
>>>>>> 
>>>>> aggregation.
>>>>> Example of SQL
>>>>> 
>>>>> select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1
>>>>> 
>>>>> Above gets converted into the following plan (edited to make more
>>>> readable)
>>>>> :
>>>>> 
>>>>> ElasticsearchToEnumerableConverter
>>>>> ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)])
>>>>>   ElasticsearchAggregate(group=[{0, 1}])
>>>>>     ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1'))
>>>>> EXPR$1=[CAST(ITEM($0, 'col2'))])
>>>>>       ElasticsearchTableScan(table=[[elastic, zips]])
>>>>> 
>>>>> I presume AggregateExpandDistinctAggregatesRule creates two
>> aggregations
>>>> ?
>>>>> If so, what is the correct / recommended way to identify those as
>>>>> originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate
>>>>> <
>>>>> 
>>>> 
>> https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java
>>>>>> 
>>>>> ? Note no distinct in first aggregation.
>>>>> 
>>>>> Alos note that when multiple columns are used for approx count (select
>>>>> approx(c1), approx(c2)) there is just a single ElasticsearchAggregate
>> so
>>>> it
>>>>> is not an issue (since use of cardinality can be inferred from
>>>>> AggregateCall.isDistinct / isApproximate flags).
>>>>> 
>>>>> Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it
>>>> looks
>>>>> too complicated.
>>>>> 
>>>>> Any hints would be appreciated.
>>>>> 
>>>>> Regards,
>>>>> Andrei.
>>>>> 
>>>> 
>> 
>>

Re: ElasticSearch Adapter. converting APPROX_COUNT_DISTINCT into Elastic cardinality

Reply via email to