[
https://issues.apache.org/jira/browse/FLINK-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15988376#comment-15988376
]
ASF GitHub Bot commented on FLINK-6373:
---------------------------------------
Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/3765
Hi @haohui,
I suggested before that PR #3771 might be used for DISTINCT group window
functions. However, this does not work because we cannot register state for an
AggregateFunction. The benefit of the approach of #3771 would have been that it
does not need to deserialize the Map every time a record is accumulated (or
retracted). Instead the distinct values are kept in a MapState that can be
accessed (and deserialized) per look up key. But this approach does not work
with the AggregateFunction that we use for early aggregation.
To be honest, I'm a bit concerned about the performance of the approach of
this PR because the state of the DistinctAccumulator accumulator (i.e., the
complete map) will be de/serialized every time we access it.
I think we can use this approach for now, but should look out, whether we
can use an approach similar to the batch side where distinct aggregations (on
different keys) are translated into multiple aggregations which are later
joined together (the join would be rather cheap because its a 1-to-1 join).
I'll have a look at this PR later today.
Thanks, Fabian
> Add runtime support for distinct aggregation over grouped windows
> -----------------------------------------------------------------
>
> Key: FLINK-6373
> URL: https://issues.apache.org/jira/browse/FLINK-6373
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Reporter: Haohui Mai
> Assignee: Haohui Mai
>
> This is a follow up task for FLINK-6335. FLINK-6335 enables parsing the
> distinct aggregations over grouped windows. This jira tracks the effort of
> adding runtime support for the query.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)