[
https://issues.apache.org/jira/browse/FLINK-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105203#comment-16105203
]
ASF GitHub Bot commented on FLINK-7206:
---------------------------------------
Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/4355#discussion_r130126823
--- Diff:
flink-libraries/flink-table/src/test/java/org/apache/flink/table/runtime/utils/JavaUserDefinedAggFunctions.java
---
@@ -135,4 +138,172 @@ public void retract(WeightedAvgAccum accumulator, int
iValue, int iWeight) {
accumulator.count -= iWeight;
}
}
+
+ /**
+ * CountDistinct accumulator.
+ */
+ public static class CountDistinctAccum {
+ public MapView<String, Integer> map;
+ public long count;
+ }
+
+ /**
+ * CountDistinct aggregate.
+ */
+ public static class CountDistinct extends AggregateFunction<Long,
CountDistinctAccum> {
--- End diff --
I don't think we should implement `COUNT DISTINCT` as a special
`AggregateFunction`. At least not in the long term.
I think it would be better to handle this inside of the
`GeneratedAggregations` and only accumulate and retract distinct values from
user-defined aggregate functions. With this approach, any aggregation function
can be used with `DISTINCT` and the state for distinction can also be shared
across multiple aggregation functions. This is also the approach that has been
started in PR #3783.
For now this is fine, but in the long run we should go for something like
PR #3783 (which also requires the `GeneratedAggregations.initialize()` method.)
> Implementation of DataView to support state access for UDAGG
> ------------------------------------------------------------
>
> Key: FLINK-7206
> URL: https://issues.apache.org/jira/browse/FLINK-7206
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: Kaibo Zhou
> Assignee: Kaibo Zhou
>
> Implementation of MapView and ListView to support state access for UDAGG.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)