Thanks Fabian for the review, I will incorporate the feedback and finalized the design doc and open a JIRA to track all sub-tasks. Please also feel free to comment if there's any other related DISTINCT aggregation use cases not covered by the design doc.
One higher level question regarding #4, should we always keep Table API functionalities to be a superset of SQL API? I have seen some features which are available on Table but not on SQL API and I was wondering if that is a must obey rule during development. -- Rong On Wed, Feb 14, 2018 at 2:32 AM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Rong, > > Thanks for taking the initiative to improve the support for DISTINCT > aggregations! > I've made a pass over your design document and left a couple of comments. > I think it is a really good write up and serves as a good start. > > IMO, the next steps could be to > 1) continue and finalize the discussion on the design doc. Feel free to > open a new umbrella JIRA and link your doc there. > 2) check which JIRAs are still relevant. Close or reorganize them according > to the plan in your design doc and make them subissues of the umbrella > issue. > 3) add support for DISTINCT in SQL > 4) later add extend the Table API to also support distinct aggregations > (this would be mostly API changes since the execution is solved before) > > Let me know what you think. > > Best, Fabian > > > 2018-02-14 3:07 GMT+01:00 Rong Rong <walter...@gmail.com>: > > > Hi Community, > > > > We are working on support of distinct aggregators over data stream on > > Table/SQL API. Currently there are seems to be many JIRAs related to > > distinct agg over stream use cases which are still pending (FLINK-6249 > > <https://issues.apache.org/jira/browse/FLINK-6249>, FLINK-6260 > > <https://issues.apache.org/jira/browse/FLINK-6260>, FLINK-5315 > > <https://issues.apache.org/jira/browse/FLINK-5315>, FLINK-6335 > > <https://issues.apache.org/jira/browse/FLINK-6335>, FLINK-6373 > > <https://issues.apache.org/jira/browse/FLINK-6373>, FLINK-6250 > > <https://issues.apache.org/jira/browse/FLINK-6250>, etc) and I am having > > some concerns when trying to come up with a solution as there might be > > other use cases out there. > > > > I summarized a write up and categorized the use cases into unbounded or > > bounded aggregations and proposed a solution through modifying and adding > > new distinct aggregate functions using UDAGG API with DataView. Please > find > > it here > > <https://docs.google.com/document/d/1zj6OA-K2hi7ah8Fo- > > xTQB-mVmYfm6LsN2_NHgTCVmJI/edit?usp=sharing> > > . > > > > Any comments or suggestions are highly appreciated. > > > > Many Thanks, > > Rong > > >