Yes, I think it makes sense to disable AggregateExpandDistinctAggregatesRule 
when optimizing for Elastic.

I’m surprised that Druid’s logic is too complicated. If believe that Druid’s 
capabilities for approximate distinct-count are similar to Elastic, and I 
believe they require similar things from Calcite; if so, the code should be 
similar, right?

In your PR, what happens if someone executes a non-approximate COUNT(DISTINCT 
…) against Elastic adapter? It looks as if it becomes a non-distinct COUNT. 
That doesn’t seem right.

Julian



> On Jan 22, 2019, at 1:22 PM, Andrei Sereda <[email protected]> wrote:
> 
> Does it make sense to remove this rule from the planner during the
> optimization of queries to Elastic?
> 
> Thanks for the hint, Stamatis. I did remove
> AggregateExpandDistinctAggregatesRule from elastic planner and things are
> working.
> For more info see PR-1008 <https://github.com/apache/calcite/pull/1008>.
> 
> On Mon, Jan 21, 2019 at 3:24 AM Stamatis Zampetakis <[email protected]>
> wrote:
> 
>> Hi Andrei,
>> 
>> From what you say it seems that if AggregateExpandDistinctAggregatesRule
>> was not applied you wouldn't have a problem translating this to Elastic.
>> Does it make sense to remove this rule from the planner during the
>> optimization of queries to Elastic?
>> 
>> Best,
>> Stamatis
>> 
>> Στις Σάβ, 19 Ιαν 2019 στις 1:35 π.μ., ο/η Andrei Sereda <[email protected]>
>> έγραψε:
>> 
>>> Hello,
>>> 
>>> I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into elastic
>>> as cardinality
>>> <
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
>>>> 
>>> aggregation.
>>> Example of SQL
>>> 
>>> select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1
>>> 
>>> Above gets converted into the following plan (edited to make more
>> readable)
>>> :
>>> 
>>> ElasticsearchToEnumerableConverter
>>>  ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)])
>>>    ElasticsearchAggregate(group=[{0, 1}])
>>>      ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1'))
>>> EXPR$1=[CAST(ITEM($0, 'col2'))])
>>>        ElasticsearchTableScan(table=[[elastic, zips]])
>>> 
>>> I presume AggregateExpandDistinctAggregatesRule creates two aggregations
>> ?
>>> If so, what is the correct / recommended way to identify those as
>>> originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate
>>> <
>>> 
>> https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java
>>>> 
>>> ? Note no distinct in first aggregation.
>>> 
>>> Alos note that when multiple columns are used for approx count (select
>>> approx(c1), approx(c2)) there is just a single ElasticsearchAggregate so
>> it
>>> is not an issue (since use of cardinality can be inferred from
>>> AggregateCall.isDistinct / isApproximate flags).
>>> 
>>> Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it
>> looks
>>> too complicated.
>>> 
>>> Any hints would be appreciated.
>>> 
>>> Regards,
>>> Andrei.
>>> 
>> 

Reply via email to