Yes, I think it makes sense to disable AggregateExpandDistinctAggregatesRule when optimizing for Elastic.
I’m surprised that Druid’s logic is too complicated. If believe that Druid’s capabilities for approximate distinct-count are similar to Elastic, and I believe they require similar things from Calcite; if so, the code should be similar, right? In your PR, what happens if someone executes a non-approximate COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a non-distinct COUNT. That doesn’t seem right. Julian > On Jan 22, 2019, at 1:22 PM, Andrei Sereda <[email protected]> wrote: > > Does it make sense to remove this rule from the planner during the > optimization of queries to Elastic? > > Thanks for the hint, Stamatis. I did remove > AggregateExpandDistinctAggregatesRule from elastic planner and things are > working. > For more info see PR-1008 <https://github.com/apache/calcite/pull/1008>. > > On Mon, Jan 21, 2019 at 3:24 AM Stamatis Zampetakis <[email protected]> > wrote: > >> Hi Andrei, >> >> From what you say it seems that if AggregateExpandDistinctAggregatesRule >> was not applied you wouldn't have a problem translating this to Elastic. >> Does it make sense to remove this rule from the planner during the >> optimization of queries to Elastic? >> >> Best, >> Stamatis >> >> Στις Σάβ, 19 Ιαν 2019 στις 1:35 π.μ., ο/η Andrei Sereda <[email protected]> >> έγραψε: >> >>> Hello, >>> >>> I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into elastic >>> as cardinality >>> < >>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html >>>> >>> aggregation. >>> Example of SQL >>> >>> select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1 >>> >>> Above gets converted into the following plan (edited to make more >> readable) >>> : >>> >>> ElasticsearchToEnumerableConverter >>> ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)]) >>> ElasticsearchAggregate(group=[{0, 1}]) >>> ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1')) >>> EXPR$1=[CAST(ITEM($0, 'col2'))]) >>> ElasticsearchTableScan(table=[[elastic, zips]]) >>> >>> I presume AggregateExpandDistinctAggregatesRule creates two aggregations >> ? >>> If so, what is the correct / recommended way to identify those as >>> originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate >>> < >>> >> https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java >>>> >>> ? Note no distinct in first aggregation. >>> >>> Alos note that when multiple columns are used for approx count (select >>> approx(c1), approx(c2)) there is just a single ElasticsearchAggregate so >> it >>> is not an issue (since use of cardinality can be inferred from >>> AggregateCall.isDistinct / isApproximate flags). >>> >>> Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it >> looks >>> too complicated. >>> >>> Any hints would be appreciated. >>> >>> Regards, >>> Andrei. >>> >>
