Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/5555
Hi @walterddr and @hequn8128, thanks for the PR, review, and discussions.
The current implementation with the `DistinctAggDelegateFunction` and
accumulators takes the path of user-defined code although `DISTINCT` is a query
feature that does not require user-code.
Wouldn't it be easier if we would just add two parameters to
`AggregationCodeGenerator.generateAggregations()` methods:
- distinctAggs: Array[Boolean]
- stateBackedDistinct: Option[Boolean]
and handle the distinct in the generated code? Given this information we
can configure the required MapViews (also reusing them across multiple
aggregation functions). Also we don't need to an aggregation function but
access the MapView directly and check for distinct input or not.
This would mean a bit more implementation effort for the code-generation,
but be the cleaner design because we do not need to wrap aggregation function
and accumulators. It would avoid all problems with nested map views and make
the planning code easier.
What do you think @walterddr, @hequn8128?
---