Re: ElasticSearch Adapter. converting APPROX_COUNT_DISTINCT into Elastic cardinality

Andrei Sereda Tue, 22 Jan 2019 15:30:37 -0800

…. require similar things from Calcite; if so, the code should be similar,
right?


Druid uses Table.rolledUpColumnValidInsideAgg function instead of
explicitly removing AggregateExpandDistinctAggregatesRule plus a few custom
rules in DruidRules
<https://github.com/apache/calcite/blob/master/druid/src/main/java/org/apache/calcite/adapter/druid/DruidRules.java>
.

In your PR, what happens if someone executes a non-approximate
COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a
non-distinct COUNT. That doesn’t seem right.

Unfortunately elastic doesn’t have precise distinct count (see Support for
precise cardinality aggregation
<https://github.com/elastic/elasticsearch/issues/15876>). Currently I’m
throwing InvalidRelException for non-approximate distinct counts
<https://github.com/apache/calcite/pull/1008/files#diff-80f1c109f5b5c10a136181eb650ff454R75>
:

if (aggCall.isDistinct() && !aggCall.isApproximate()) {
  final String message = String.format(Locale.ROOT, "Only approximate distinct "
      + "aggregations are supported in Elastic (cardinality
aggregation). Use %s function",
      SqlStdOperatorTable.APPROX_COUNT_DISTINCT.getName());
  throw new InvalidRelException(message);
}

Should (Can) it be handled differently ?

On Tue, Jan 22, 2019 at 5:44 PM Julian Hyde <[email protected]> wrote:

> Yes, I think it makes sense to disable
> AggregateExpandDistinctAggregatesRule when optimizing for Elastic.
>
> I’m surprised that Druid’s logic is too complicated. If believe that
> Druid’s capabilities for approximate distinct-count are similar to Elastic,
> and I believe they require similar things from Calcite; if so, the code
> should be similar, right?
>
> In your PR, what happens if someone executes a non-approximate
> COUNT(DISTINCT …) against Elastic adapter? It looks as if it becomes a
> non-distinct COUNT. That doesn’t seem right.
>
> Julian
>
>
>
> > On Jan 22, 2019, at 1:22 PM, Andrei Sereda <[email protected]> wrote:
> >
> > Does it make sense to remove this rule from the planner during the
> > optimization of queries to Elastic?
> >
> > Thanks for the hint, Stamatis. I did remove
> > AggregateExpandDistinctAggregatesRule from elastic planner and things are
> > working.
> > For more info see PR-1008 <https://github.com/apache/calcite/pull/1008>.
> >
> > On Mon, Jan 21, 2019 at 3:24 AM Stamatis Zampetakis <[email protected]>
> > wrote:
> >
> >> Hi Andrei,
> >>
> >> From what you say it seems that if AggregateExpandDistinctAggregatesRule
> >> was not applied you wouldn't have a problem translating this to Elastic.
> >> Does it make sense to remove this rule from the planner during the
> >> optimization of queries to Elastic?
> >>
> >> Best,
> >> Stamatis
> >>
> >> Στις Σάβ, 19 Ιαν 2019 στις 1:35 π.μ., ο/η Andrei Sereda
> <[email protected]>
> >> έγραψε:
> >>
> >>> Hello,
> >>>
> >>> I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into
> elastic
> >>> as cardinality
> >>> <
> >>>
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
> >>>>
> >>> aggregation.
> >>> Example of SQL
> >>>
> >>> select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1
> >>>
> >>> Above gets converted into the following plan (edited to make more
> >> readable)
> >>> :
> >>>
> >>> ElasticsearchToEnumerableConverter
> >>>  ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)])
> >>>    ElasticsearchAggregate(group=[{0, 1}])
> >>>      ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1'))
> >>> EXPR$1=[CAST(ITEM($0, 'col2'))])
> >>>        ElasticsearchTableScan(table=[[elastic, zips]])
> >>>
> >>> I presume AggregateExpandDistinctAggregatesRule creates two
> aggregations
> >> ?
> >>> If so, what is the correct / recommended way to identify those as
> >>> originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate
> >>> <
> >>>
> >>
> https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java
> >>>>
> >>> ? Note no distinct in first aggregation.
> >>>
> >>> Alos note that when multiple columns are used for approx count (select
> >>> approx(c1), approx(c2)) there is just a single ElasticsearchAggregate
> so
> >> it
> >>> is not an issue (since use of cardinality can be inferred from
> >>> AggregateCall.isDistinct / isApproximate flags).
> >>>
> >>> Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it
> >> looks
> >>> too complicated.
> >>>
> >>> Any hints would be appreciated.
> >>>
> >>> Regards,
> >>> Andrei.
> >>>
> >>
>
>

Re: ElasticSearch Adapter. converting APPROX_COUNT_DISTINCT into Elastic cardinality

Reply via email to