Re: Support distinct aggregation over data stream on Table/SQL API

Fabian Hueske Thu, 15 Feb 2018 11:56:00 -0800

Hi Rong,

Thanks for the update!
Please suggest JIRAs to close (or close them yourself if possible) if they
are covered by the ones that you create.


At the moment, we aim for feature parity between SQL and Table API.
So ideally all features are available in both APIs. This is usually not too
complicated, because they have the same internal representation (a Calcite
RelNode tree).
A path that we often take is to start implementing a feature for SQL and
add the missing API and translation step to the Table API afterwards.

In the long run, the Table API might have some shortcuts for features that
are hard to express in SQL but we are not there yet.

Best, Fabian

2018-02-15 20:06 GMT+01:00 Rong Rong <[email protected]>:

> Thanks Fabian for the review,
>
> I will incorporate the feedback and finalized the design doc and open a
> JIRA to track all sub-tasks.
> Please also feel free to comment if there's any other related DISTINCT
> aggregation use cases not covered by the design doc.
>
> One higher level question regarding #4, should we always keep Table API
> functionalities to be a superset of SQL API?
> I have seen some features which are available on Table but not on SQL API
> and I was wondering if that is a must obey rule during development.
>
> --
> Rong
>
> On Wed, Feb 14, 2018 at 2:32 AM, Fabian Hueske <[email protected]> wrote:
>
> > Hi Rong,
> >
> > Thanks for taking the initiative to improve the support for DISTINCT
> > aggregations!
> > I've made a pass over your design document and left a couple of comments.
> > I think it is a really good write up and serves as a good start.
> >
> > IMO, the next steps could be to
> > 1) continue and finalize the discussion on the design doc. Feel free to
> > open a new umbrella JIRA and link your doc there.
> > 2) check which JIRAs are still relevant. Close or reorganize them
> according
> > to the plan in your design doc and make them subissues of the umbrella
> > issue.
> > 3) add support for DISTINCT in SQL
> > 4) later add extend the Table API to also support distinct aggregations
> > (this would be mostly API changes since the execution is solved before)
> >
> > Let me know what you think.
> >
> > Best, Fabian
> >
> >
> > 2018-02-14 3:07 GMT+01:00 Rong Rong <[email protected]>:
> >
> > > Hi Community,
> > >
> > > We are working on support of distinct aggregators over data stream on
> > > Table/SQL API. Currently there are seems to be many JIRAs related to
> > > distinct agg over stream use cases which are still pending (FLINK-6249
> > > <https://issues.apache.org/jira/browse/FLINK-6249>, FLINK-6260
> > > <https://issues.apache.org/jira/browse/FLINK-6260>, FLINK-5315
> > > <https://issues.apache.org/jira/browse/FLINK-5315>, FLINK-6335
> > > <https://issues.apache.org/jira/browse/FLINK-6335>, FLINK-6373
> > > <https://issues.apache.org/jira/browse/FLINK-6373>, FLINK-6250
> > > <https://issues.apache.org/jira/browse/FLINK-6250>, etc) and I am
> having
> > > some concerns when trying to come up with a solution as there might be
> > > other use cases out there.
> > >
> > > I summarized a write up and categorized the use cases into unbounded or
> > > bounded aggregations and proposed a solution through modifying and
> adding
> > > new distinct aggregate functions using UDAGG API with DataView. Please
> > find
> > > it here
> > > <https://docs.google.com/document/d/1zj6OA-K2hi7ah8Fo-
> > > xTQB-mVmYfm6LsN2_NHgTCVmJI/edit?usp=sharing>
> > > .
> > >
> > > Any comments or suggestions are highly appreciated.
> > >
> > > Many Thanks,
> > > Rong
> > >
> >
>

Re: Support distinct aggregation over data stream on Table/SQL API

Reply via email to