Hi Jincheng,

Thanks for the clarification. Take 'TopN' for example, if I want to write
such kind function,
should I make sure that this function is used with some keys? If I need a
use case to calculate
topn without key, should I write another function or I can reuse previous
one.

I'm not sure about the idea of this does not involve semantic changes. To
me, it sounds like
we are doing another nested aggregation inside the table
which TableAggregateFunction emits.

Maybe I'm not familiar with this function enough, hope you can help me to
understand.

Best,
Kurt


On Mon, Jul 1, 2019 at 11:59 AM jincheng sun <sunjincheng...@gmail.com>
wrote:

> Hi Kurt,
>
> Thanks for your questions, I am glad to share my thoughts here:
>
> My question is, will that effect the logic ofTableAggregateFunction user
> > wrote? Should the user know that there will a key and make some changes
> to
> > this function?
>
>
> No, the keys information depends on the implementation of the
> TableAggregateFunction.
> For example, for a `topN` user defined TableAggregateFunction, we can only
> use the `keys` if the `topN` contains `rankid` in the output. You can
> treat the
> `keys` like an indicator.
>
> If not, how will framework deal with the output of user's
> > TableAggregateFunction.  if user output multiple rows with the same key,
> > should be latter one replace previous ones?
>
>
> If a TableAggregateFunction outputs multiple rows with the same key, the
> latter one should replace the previous one, either with upsert mode or
> retract mode. i.e., Whether the user defines the Key or not, the Flink
> framework should ensure the correctness of the semantics.
>
> At present, the problem we are discussing does not involve semantic
> changes. The definition of keys is to support non-window flatAggregate on
> upsert mode. (The upsert mode is already supported in the flink framework.
> The current discussion only needs to inform the framework that the keys
> information, which is the `withKeys` API we discussing.)
>
> Welcome any other feedbacks :)
>
> Best,
> Jincheng
>
> Kurt Young <ykt...@gmail.com> 于2019年7月1日周一 上午9:23写道:
>
> > Hi,
> >
> > I have a question about the key information of TableAggregateFunction.
> > IIUC, you need to define
> > something like primary key or unique key in the result table of
> > TableAggregateFunction, and also
> > need a way to let user configure this through the API. My question is,
> will
> > that effect the logic of
> > TableAggregateFunction user wrote? Should the user know that there will a
> > key and make some changes
> > to this function?
> >
> > If so, what's the semantic the user should learned. If not, how will
> > framework deal with the output of user's
> > TableAggregateFunction. For example, if user output multiple rows with
> the
> > same key, should be latter one
> > replace previous ones?
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Jul 1, 2019 at 7:19 AM jincheng sun <sunjincheng...@gmail.com>
> > wrote:
> >
> > > Hi hequn, Thanks for the reply! I think `withKeys` solution is our
> better
> > > choice!
> > >
> > >
> > > Hequn Cheng <chenghe...@gmail.com> 于2019年6月26日周三 下午5:11写道:
> > >
> > > > Hi Jincheng,
> > > >
> > > > Thanks for raising the discussion!
> > > > The key information is very important for query optimizations. It
> would
> > > be
> > > > nice if we can use upsert mode to achieve better performance.
> > > >
> > > > +1 for the `withKeys` proposal. :)
> > > >
> > > > Best, Hequn
> > > >
> > > >
> > > > On Wed, Jun 26, 2019 at 4:37 PM jincheng sun <
> sunjincheng...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > With the continuous efforts from the community, we already
> supported
> > > > > `flatAggregate`[1] on TableAPI in retract mode. I think It's better
> > to
> > > > add
> > > > > upsert mode for  `flatAggregate`.
> > > > >
> > > > > The result table of streaming non-window `flatAggregate` is a table
> > > > > contains updates. We can, of course, use a
> RetractStreamTableSink[2]
> > to
> > > > > emit the table, but we can get better performance in upsert mode.
> > > > However,
> > > > > due to the lack of keys, we can’t use an UpsertStreamTableSink to
> > emit
> > > > the
> > > > > table. We don’t have this problem for a normal aggregate as it
> emits
> > a
> > > > > single row for each group, so the unique keys are exactly the same
> > with
> > > > the
> > > > > group keys. While for a `flatAggregate`, its pretty difference that
> > due
> > > > to
> > > > > emits multi rows(a “sub-table”) for a single group. To solve this
> > > > problem,
> > > > > we need to find a way to define keys on flatAggregate, so that we
> can
> > > > also
> > > > > use upsert sink to emit the result table after flatAggregate.
> > > > >
> > > > > So, Aljoscha, Hequn and I prepared a design document for how to
> > define
> > > > the
> > > > > update keys for  `flatAggregate` in upsert mode.  The detail can be
> > > found
> > > > > here:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/183qHo8PDG-xserDi_AbGP6YX9aPY0rVr80p3O3Gyz6U/edit?usp=sharing
> > > > >
> > > > > I appreciate it if you can have look at the document and any
> comments
> > > are
> > > > > welcome!
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Jincheng
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739
> > > > >
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/sourceSinks.html#defining-a-streamtablesource
> > > > >
> > > >
> > >
> >
>

Reply via email to