ElasticSearch Adapter. converting APPROX_COUNT_DISTINCT into Elastic cardinality

Andrei Sereda Fri, 18 Jan 2019 16:36:19 -0800

Hello,

I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into elastic
as cardinality
<https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html>
aggregation.
Example of SQL


select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1

Above gets converted into the following plan (edited to make more readable)
:

ElasticsearchToEnumerableConverter
  ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)])
    ElasticsearchAggregate(group=[{0, 1}])
      ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1'))
EXPR$1=[CAST(ITEM($0, 'col2'))])
        ElasticsearchTableScan(table=[[elastic, zips]])

I presume AggregateExpandDistinctAggregatesRule creates two aggregations ?
If so, what is the correct / recommended way to identify those as
originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate
<https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java>
? Note no distinct in first aggregation.

Alos note that when multiple columns are used for approx count (select
approx(c1), approx(c2)) there is just a single ElasticsearchAggregate so it
is not an issue (since use of cardinality can be inferred from
AggregateCall.isDistinct / isApproximate flags).

Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it looks
too complicated.

Any hints would be appreciated.

Regards,
Andrei.

ElasticSearch Adapter. converting APPROX_COUNT_DISTINCT into Elastic cardinality

Reply via email to