Hi everyone,

We made some progress in the implementation of UDAGG (FLINK-5564). However,
we realized that there are cases where users may want to use state backend
to store the data. For instance, the built-in MaxWithRetractAggFunction
currently create a hashMap to store the historical data. It will have
problem when the # of keys are huge enough, thereby leading to OOM.

In FLINK-6544, we have proposed an approach to expose State Backend
Interface for UDAGG. A brief design doc can be found in
https://docs.google.com/document/d/1g-wHOuFj5pMaYMJg90kVIO2IHbiQiO26nWscLIOn50c/edit

I am opening this discussion thread, as I realized there are recently some
open jiras which are towards to implement some special aggregators, such as
"count distinct". IMO, "count distinct" is just an UDAGG. With the new
proposed FLINK-6544, we can just make it as a built-in agg without changing
the current UDAGG framework.

@Fabian, @Haohui, @Radu, please take a look at FLINK-6544 and let me know
what you think.
Btw, we do not need include this change for release 1.3 in our opinion.

Regards,
Shaoxuan

Reply via email to