Github user fhueske commented on a diff in the pull request:
https://github.com/apache/flink/pull/5241#discussion_r162642637
--- Diff:
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/dataset/DataSetAggregate.scala
---
@@ -157,6 +158,7 @@ class DataSetAggregate(
} else {
inputDS
.reduceGroup(finalAgg)
+ .mapPartition(emptyProcessMapPartition.get)
--- End diff --
I thought about this again.
I think we should extend `DataSetFinalAggFunction` to also implement
`MapPartitionFunction` similar to the `DataSetPreAggFunction`. The
`GroupReduceFunction.reduceGroup()` method and the
`MapPartitionFunction.mapPartition()` function share the same code. For the
grouped aggregation, we use `reduceGroup(function)` and for the non-grouped
aggregation we use `mapPartition(function).setParallelism(1)`. Setting the
parallelism is important here.
That way we avoid an additional function and its a cleaner design.
---