Hi everyone, We made some progress in the implementation of UDAGG (FLINK-5564). However, we realized that there are cases where users may want to use state backend to store the data. For instance, the built-in MaxWithRetractAggFunction currently create a hashMap to store the historical data. It will have problem when the # of keys are huge enough, thereby leading to OOM.
In FLINK-6544, we have proposed an approach to expose State Backend Interface for UDAGG. A brief design doc can be found in https://docs.google.com/document/d/1g-wHOuFj5pMaYMJg90kVIO2IHbiQiO26nWscLIOn50c/edit I am opening this discussion thread, as I realized there are recently some open jiras which are towards to implement some special aggregators, such as "count distinct". IMO, "count distinct" is just an UDAGG. With the new proposed FLINK-6544, we can just make it as a built-in agg without changing the current UDAGG framework. @Fabian, @Haohui, @Radu, please take a look at FLINK-6544 and let me know what you think. Btw, we do not need include this change for release 1.3 in our opinion. Regards, Shaoxuan