Hello,
I’m trying to push-down SQL APPROX_COUNT_DISTINCT() function into elastic
as cardinality
<https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html>
aggregation.
Example of SQL
select col1, APPROX_COUNT_DISTINCT(col2) from elastic group by col1
Above gets converted into the following plan (edited to make more readable)
:
ElasticsearchToEnumerableConverter
ElasticsearchAggregate(group=[{0}], EXPR$1=[COUNT($1)])
ElasticsearchAggregate(group=[{0, 1}])
ElasticsearchProject(EXPR$0=[CAST(ITEM($0, 'col1'))
EXPR$1=[CAST(ITEM($0, 'col2'))])
ElasticsearchTableScan(table=[[elastic, zips]])
I presume AggregateExpandDistinctAggregatesRule creates two aggregations ?
If so, what is the correct / recommended way to identify those as
originated from APPROX_COUNT_DISTINCT in ElasticSearchAggregate
<https://github.com/apache/calcite/blob/master/elasticsearch/src/main/java/org/apache/calcite/adapter/elasticsearch/ElasticsearchAggregate.java>
? Note no distinct in first aggregation.
Alos note that when multiple columns are used for approx count (select
approx(c1), approx(c2)) there is just a single ElasticsearchAggregate so it
is not an issue (since use of cardinality can be inferred from
AggregateCall.isDistinct / isApproximate flags).
Druid adapter has some logic around APPROX_COUNT_DISTINCT() it but it looks
too complicated.
Any hints would be appreciated.
Regards,
Andrei.