…. require similar things from Calcite; if so, the code should be similar, right?
Druid uses Table.rolledUpColumnValidInsideAgg function instead of explicitly removing AggregateExpandDistinctAggregatesRule plus a few custom rules in DruidRules <https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidRules.java> . In your PR, what happens if someone executes a non-approximate COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a non-distinct COUNT. That doesn’t seem right. Unfortunately elastic doesn’t have precise distinct count (see Support for precise cardinality aggregation <https://github.com/elastic/elasticsearch/issues/15876>). Currently I’m throwing InvalidRelException for non-approximate distinct counts <https://github.com/apache/calcite/pull/1008/files#diff-80f1c109f5b5c10a136181eb650ff454R75> : if (aggCall.isDistinct() && !aggCall.isApproximate()) { final String message = String.format(Locale.ROOT, "Only approximate distinct " + "aggregations are supported in Elastic (cardinality aggregation). Use %s function", SqlStdOperatorTable.APPROX_COUNT_DISTINCT.getName()); throw new InvalidRelException(message); } Should (Can) it be handled differently ? On Tue, Jan 22, 2019 at 5:44 PM Julian Hyde <[email protected]> wrote: > Yes, I think it makes sense to disable > AggregateExpandDistinctAggregatesRule when optimizing for Elastic. > > I’m surprised that Druid’s logic is too complicated. If believe that > Druid’s capabilities for approximate distinct-count are similar to Elastic, > and I believe they require similar things from Calcite; if so, the code > should be similar, right? > > In your PR, what happens if someone executes a non-approximate > COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a > non-distinct COUNT. That doesn’t seem right. > > Julian > > > > > On Jan 22, 2019, at 1:22 PM, Andrei Sereda <[email protected]> wrote: > > > > Does it make sense to remove this rule from the planner during the > > optimization of queries to Elastic? > > > > Thanks for the hint, Stamatis. I did remove > > AggregateExpandDistinctAggregatesRule from elastic planner and things are > > working. > > For more info see PR-1008 <https://github.com/apache/calcite/pull/1008>. > > > > On Mon, Jan 21, 2019 at 3:24 AM Stamatis Zampetakis <[email protected]> > > wrote: > > > >> Hi Andrei, > >> > >> From what you say it seems that if AggregateExpandDistinctAggregatesRule > >> was not applied you wouldn't have a problem translating this to Elastic. > >> Does it make sense to remove this rule from the planner during the > >> optimization of queries to Elastic? > >> > >> Best, > >> Stamatis > >> > >> Στις Σάβ, 19 Ιαν 2019 στις 1:35 π.μ., ο/η Andrei Sereda > <[email protected]> > >> έγραψε: > >> > >>> Hello, > >>> > >>> I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into > elastic > >>> as cardinality > >>> < > >>> > >> > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html > >>>> > >>> aggregation. > >>> Example of SQL > >>> > >>> select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1 > >>> > >>> Above gets converted into the following plan (edited to make more > >> readable) > >>> : > >>> > >>> ElasticsearchToEnumerableConverter > >>> ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)]) > >>> ElasticsearchAggregate(group=[{0, 1}]) > >>> ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1')) > >>> EXPR$1=[CAST(ITEM($0, 'col2'))]) > >>> ElasticsearchTableScan(table=[[elastic, zips]]) > >>> > >>> I presume AggregateExpandDistinctAggregatesRule creates two > aggregations > >> ? > >>> If so, what is the correct / recommended way to identify those as > >>> originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate > >>> < > >>> > >> > https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java > >>>> > >>> ? Note no distinct in first aggregation. > >>> > >>> Alos note that when multiple columns are used for approx count (select > >>> approx(c1), approx(c2)) there is just a single ElasticsearchAggregate > so > >> it > >>> is not an issue (since use of cardinality can be inferred from > >>> AggregateCall.isDistinct / isApproximate flags). > >>> > >>> Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it > >> looks > >>> too complicated. > >>> > >>> Any hints would be appreciated. > >>> > >>> Regards, > >>> Andrei. > >>> > >> > >
