Re: Support distinct aggregation over data stream on Table/SQL API

Rong Rong Thu, 15 Feb 2018 11:08:06 -0800

Thanks Fabian for the review,

I will incorporate the feedback and finalized the design doc and open a
JIRA to track all sub-tasks.
Please also feel free to comment if there's any other related DISTINCT
aggregation use cases not covered by the design doc.


One higher level question regarding #4, should we always keep Table API
functionalities to be a superset of SQL API?
I have seen some features which are available on Table but not on SQL API
and I was wondering if that is a must obey rule during development.

--
Rong

On Wed, Feb 14, 2018 at 2:32 AM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Rong,
>
> Thanks for taking the initiative to improve the support for DISTINCT
> aggregations!
> I've made a pass over your design document and left a couple of comments.
> I think it is a really good write up and serves as a good start.
>
> IMO, the next steps could be to
> 1) continue and finalize the discussion on the design doc. Feel free to
> open a new umbrella JIRA and link your doc there.
> 2) check which JIRAs are still relevant. Close or reorganize them according
> to the plan in your design doc and make them subissues of the umbrella
> issue.
> 3) add support for DISTINCT in SQL
> 4) later add extend the Table API to also support distinct aggregations
> (this would be mostly API changes since the execution is solved before)
>
> Let me know what you think.
>
> Best, Fabian
>
>
> 2018-02-14 3:07 GMT+01:00 Rong Rong <walter...@gmail.com>:
>
> > Hi Community,
> >
> > We are working on support of distinct aggregators over data stream on
> > Table/SQL API. Currently there are seems to be many JIRAs related to
> > distinct agg over stream use cases which are still pending (FLINK-6249
> > <https://issues.apache.org/jira/browse/FLINK-6249>, FLINK-6260
> > <https://issues.apache.org/jira/browse/FLINK-6260>, FLINK-5315
> > <https://issues.apache.org/jira/browse/FLINK-5315>, FLINK-6335
> > <https://issues.apache.org/jira/browse/FLINK-6335>, FLINK-6373
> > <https://issues.apache.org/jira/browse/FLINK-6373>, FLINK-6250
> > <https://issues.apache.org/jira/browse/FLINK-6250>, etc) and I am having
> > some concerns when trying to come up with a solution as there might be
> > other use cases out there.
> >
> > I summarized a write up and categorized the use cases into unbounded or
> > bounded aggregations and proposed a solution through modifying and adding
> > new distinct aggregate functions using UDAGG API with DataView. Please
> find
> > it here
> > <https://docs.google.com/document/d/1zj6OA-K2hi7ah8Fo-
> > xTQB-mVmYfm6LsN2_NHgTCVmJI/edit?usp=sharing>
> > .
> >
> > Any comments or suggestions are highly appreciated.
> >
> > Many Thanks,
> > Rong
> >
>

Re: Support distinct aggregation over data stream on Table/SQL API

Reply via email to