Ah, I’d forgotten about Table.rolledUpColumnValidInsideAgg (for https://issues.apache.org/jira/browse/CALCITE-1787 <https://issues.apache.org/jira/browse/CALCITE-1787>). Yes, Elastic doesn’t have to deal with that complexity.
Since you throw for non-approximate distinct-count, I think it will be fine. You COULD keep AggregateExpandDistinctAggregatesRule around to handle such aggregates but I think your users would prefer that you throw. Julian > On Jan 22, 2019, at 3:30 PM, Andrei Sereda <[email protected]> wrote: > > …. require similar things from Calcite; if so, the code should be similar, > right? > > Druid uses Table.rolledUpColumnValidInsideAgg function instead of > explicitly removing AggregateExpandDistinctAggregatesRule plus a few custom > rules in DruidRules > <https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidRules.java> > . > > In your PR, what happens if someone executes a non-approximate > COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a > non-distinct COUNT. That doesn’t seem right. > > Unfortunately elastic doesn’t have precise distinct count (see Support for > precise cardinality aggregation > <https://github.com/elastic/elasticsearch/issues/15876>). Currently I’m > throwing InvalidRelException for non-approximate distinct counts > <https://github.com/apache/calcite/pull/1008/files#diff-80f1c109f5b5c10a136181eb650ff454R75> > : > > if (aggCall.isDistinct() && !aggCall.isApproximate()) { > final String message = String.format(Locale.ROOT, "Only approximate distinct > " > + "aggregations are supported in Elastic (cardinality > aggregation). Use %s function", > SqlStdOperatorTable.APPROX_COUNT_DISTINCT.getName()); > throw new InvalidRelException(message); > } > > Should (Can) it be handled differently ? > > On Tue, Jan 22, 2019 at 5:44 PM Julian Hyde <[email protected]> wrote: > >> Yes, I think it makes sense to disable >> AggregateExpandDistinctAggregatesRule when optimizing for Elastic. >> >> I’m surprised that Druid’s logic is too complicated. If believe that >> Druid’s capabilities for approximate distinct-count are similar to Elastic, >> and I believe they require similar things from Calcite; if so, the code >> should be similar, right? >> >> In your PR, what happens if someone executes a non-approximate >> COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a >> non-distinct COUNT. That doesn’t seem right. >> >> Julian >> >> >> >>> On Jan 22, 2019, at 1:22 PM, Andrei Sereda <[email protected]> wrote: >>> >>> Does it make sense to remove this rule from the planner during the >>> optimization of queries to Elastic? >>> >>> Thanks for the hint, Stamatis. I did remove >>> AggregateExpandDistinctAggregatesRule from elastic planner and things are >>> working. >>> For more info see PR-1008 <https://github.com/apache/calcite/pull/1008>. >>> >>> On Mon, Jan 21, 2019 at 3:24 AM Stamatis Zampetakis <[email protected]> >>> wrote: >>> >>>> Hi Andrei, >>>> >>>> From what you say it seems that if AggregateExpandDistinctAggregatesRule >>>> was not applied you wouldn't have a problem translating this to Elastic. >>>> Does it make sense to remove this rule from the planner during the >>>> optimization of queries to Elastic? >>>> >>>> Best, >>>> Stamatis >>>> >>>> Στις Σάβ, 19 Ιαν 2019 στις 1:35 π.μ., ο/η Andrei Sereda >> <[email protected]> >>>> έγραψε: >>>> >>>>> Hello, >>>>> >>>>> I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into >> elastic >>>>> as cardinality >>>>> < >>>>> >>>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html >>>>>> >>>>> aggregation. >>>>> Example of SQL >>>>> >>>>> select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1 >>>>> >>>>> Above gets converted into the following plan (edited to make more >>>> readable) >>>>> : >>>>> >>>>> ElasticsearchToEnumerableConverter >>>>> ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)]) >>>>> ElasticsearchAggregate(group=[{0, 1}]) >>>>> ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1')) >>>>> EXPR$1=[CAST(ITEM($0, 'col2'))]) >>>>> ElasticsearchTableScan(table=[[elastic, zips]]) >>>>> >>>>> I presume AggregateExpandDistinctAggregatesRule creates two >> aggregations >>>> ? >>>>> If so, what is the correct / recommended way to identify those as >>>>> originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate >>>>> < >>>>> >>>> >> https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java >>>>>> >>>>> ? Note no distinct in first aggregation. >>>>> >>>>> Alos note that when multiple columns are used for approx count (select >>>>> approx(c1), approx(c2)) there is just a single ElasticsearchAggregate >> so >>>> it >>>>> is not an issue (since use of cardinality can be inferred from >>>>> AggregateCall.isDistinct / isApproximate flags). >>>>> >>>>> Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it >>>> looks >>>>> too complicated. >>>>> >>>>> Any hints would be appreciated. >>>>> >>>>> Regards, >>>>> Andrei. >>>>> >>>> >> >>
