Hi Xuyang,

Thanks a lot for starting this discussion.

At first, I was a bit confused because the FLIP talks about
deprecating the Legacy Group Window Aggregations, but they have
already been marked as deprecated in the documentation [1].
My understanding was that the big challenge was that we don't yet
support SESSION windows in the Window TVF, and that the other features
you've mentioned in the discussion threads are additional
capabilities.

However, when reading up on the actual FLIP (your discussion email
didn't include a link [2] to it) I now understand the situation. The
appendix table is actually the most valuable for me, because it gives
me the overview of the missing capabilities between TVF implementation
and the Legacy Group Window Aggregations.

On the topic of syntax for early/late fires, there is existing
configuration for the legacy group windows:

SET table.exec.emit.early-fire.enabled = true;
SET table.exec.emit.early-fire.delay = 5s;
SET table.exec.emit.late-fire.enabled = true;
SET table.exec.emit.late-fire.delay = 0;
SET table.exec.emit.allow-lateness = 5s;

We should stick with the syntax for the TVFs, and not modify that.

On the topic of column naming, for other situations where a user wants
to use a value that's already reserved, we require the user to include
backticks to indicate that Flink should not use the reserved keyword
implementation. Why isn't that sufficient in this case? I rather stay
consistent with this behaviour instead of introducing new config
options.

Overall, +1 on this topic. I would propose to rename the FLIP to
something like "Add Missing Table Valued Functions Features to Replace
Legacy Group Window Aggregation".

Best regards,

Martijn

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-agg/#group-window-aggregation
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation


On Wed, Dec 6, 2023 at 9:32 AM Xuyang <xyzhong...@163.com> wrote:
>
> Hi, Shengkai. Thanks to share your thought. Let me answer related questions。
>
>
> > Could you give an example about the pass-through column. A session window 
> > may contain multiple rows, which value is selected by the windowoperator?
>
>
> The table function make the entire inpyt row available in the output. Take an 
> example in Flink, if the input row is <a,b,c> with schema <f0,f1,f2>,
> the output of the window tvf can also access the <a,b,c> with schema 
> <f0,f1,f2>. The session window always output these multi rows with attached 
> window_start,
> window_end and window_time column.
>
>
> > What's the behavior if all the data in the window have been removed?
>
>
> Align the behavior of existing group window agg, and do not output data when 
> the window is empty.
>
>
> > Could you explain more details about the how window process CDC? For 
> > example, what's the behavior if the window only gets the DELETE data from 
> > the upstream Operator.
>
>
> Also align the behavior of existing group window agg. However, there is a bug 
> in group window agg that currently group window agg has different results 
> when only
> consuming -D records while using or not using minibatch. Refs more at 
> FLINK-33760.
>
>
> > The subtitle is not correct here.
>
>
> Updated the doc to fix it.
>
>
> > It's better if we can introduce a syntax like the `emit` keyword to set the 
> > emit strategy for every window.
> I agree with you. But I don't recommend using this syntax SESSION(data 
> [PARTITION BY (keycols, ...)], DESCRIPTOR(timecol), gap, emit='') because it 
> breaks the syntax in Flip-145.
> I think using query hint is a better idea. Anyway, this work should belong to 
> another flip.
>
>
> > I think more work should be mentioned in the FLIP. What's the behavior if 
> > the input schema contains a column named `window_start`?
> > In the current design, `window_start` is a reserved keyword in the window 
> > TVF syntax, but it is legal in the legacy implementation.
>
>
> This is a good question. Perhaps we can introduce a config to add a specific 
> config (named "table.window-additional-columns.prefix") to add a prefix
> of all window addition columns to solve this situation. For example, user can 
> set the conf to "$", and the additional column from window will become
> "$window_start", "$window_end" and "$window_time". WDYT?
>
>
> > In the FLIP, you mention the FLIP should introduce an option to fall back 
> > to the legacy behavior. Could you tell us what's the name of the option?
> > BTW, I think we should unify the implementation when window TVF can do all 
> > the work that the legacy operator can do and there is no need to introduce 
> > an option to fallback.
>
>
> What about using config named "table.optimizer.window-rewrite-enabled"? If
> I agree with you that util all features are aligned and everything is ok 
> about window tvf, we should also remove this config about fallback.
> But I think to be on the safe side, we can observe one or two versions of 
> this rewrite, and allows users to roll back when problems arise.
>
>
> > If we remove the legacy window operator in the future, how users upgrade 
> > their jobs? Do you have any plan to support state migration from the legacy 
> > window to Windows TVF?
> IIRC, currently compatibility across middle versions of SQL is not 
> guaranteed. Should we add constraints on this part?
>
>
> Look for your feedback!
>
>
>
>
>
>
>
> --
>
>     Best!
>     Xuyang
>
>
>
>
>
> At 2023-12-05 12:06:27, "liu ron" <ron9....@gmail.com> wrote:
> >Hi, xuyang
> >
> >Thanks for starting this FLIP discussion, currently there are two types of
> >window aggregation in Flink SQL, namely legacy group window aggregation and
> >window tvf aggregation, these two types of window aggregation are not fully
> >aligned in behavior, which will bring a lot of confusion to the users, so
> >there is a need to unify and align them. I think the final ideal state
> >should be that there is only one window tvf aggregation, which supports
> >Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data
> >streams. There is also support for configuring EARLY-FIRE and LATER-FIRE.
> >
> >This FLIP is a continuation of FLIP-145, and also supports legacy group
> >window aggregation to flat-migrate to the new window tvf agregation, which
> >is very useful, especially for the support of CDC streams, a pain point
> >that users often feedback. Big +1 for this FLIP.
> >
> >Best,
> >Ron
> >
> >Xuyang <xyzhong...@163.com> 于2023年12月5日周二 11:11写道:
> >
> >> Hi, Feng and David.
> >>
> >>
> >> Thank you very much to share your thoughts.
> >>
> >>
> >> This flip does not include the official exposure of these experimental
> >> conf to users. Thus there is not adetailed description of this part.
> >> However, in view that some technical users may have added these
> >> experimental conf in actual production jobs, the processing
> >> of these conf while using window tvf syntax has been added to this flip.
> >>
> >>
> >> Overall, the behavior of using these experimental parameters is no
> >> different from before, and I think we should provide the compatibility
> >> about using these experimental conf.
> >>
> >>
> >> Look for your thoughs.
> >>
> >>
> >>
> >>
> >> --
> >>
> >>     Best!
> >>     Xuyang
> >>
> >>
> >>
> >>
> >>
> >> At 2023-12-05 09:17:49, "David Anderson" <dander...@apache.org> wrote:
> >> >The current situation (where we have both the legacy windows and the
> >> >TVF-based windows) is confusing for users, and I'd like to see us move
> >> >forward as rapidly as possible.
> >> >
> >> >Since the early fire, late fire, and allowed lateness features were never
> >> >documented or exposed to users, I don't feel that we need to provide
> >> >replacements for these internal, experimental features before officially
> >> >deprecating the legacy group window aggregations, and I'd rather not wait.
> >> >
> >> >However, I'd be delighted to see a proposal for what that might look like.
> >> >
> >> >Best,
> >> >David
> >> >
> >> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin <jinfeng1...@gmail.com> wrote:
> >> >
> >> >> Hi xuyang,
> >> >>
> >> >> Thank you for initiating this proposal.
> >> >>
> >> >> I'm glad to see that TVF's functionality can be fully supported.
> >> >>
> >> >> Regarding the early fire, late fire, and allow lateness features, how
> >> will
> >> >> they be provided to users? The documentation doesn't seem to provide a
> >> >> detailed description of this part.
> >> >>
> >> >> Since this FLIP will also involve a lot of feature development, I am
> >> more
> >> >> than willing to help, including development and code review.
> >> >>
> >> >> Best,
> >> >> Feng
> >> >>
> >> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang <xyzhong...@163.com> wrote:
> >> >>
> >> >> > Hi all.
> >> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> >> >> > Window Aggregation.
> >> >> >
> >> >> >
> >> >> > Although the current Flink SQL Window Aggregation documentation[1]
> >> >> > indicates that the legacy Group Window Aggregation
> >> >> > syntax has been deprecated, the new Window TVF Aggregation syntax has
> >> not
> >> >> > fully covered all of the features of the legacy one.
> >> >> >
> >> >> >
> >> >> > Compared to Group Window Aggergation, Window TVF Aggergation has
> >> several
> >> >> > advantages, such as two-stage optimization,
> >> >> > support for standard GROUPING SET syntax, and so on. However, it
> >> needs to
> >> >> > supplement and enrich the following features.
> >> >> >
> >> >> >
> >> >> > 1. Support for SESSION Window TVF Aggregation
> >> >> > 2. Support for consuming CDC stream
> >> >> > 3. Support for HOP window size with non-integer step length
> >> >> > 4. Support for configurations such as early fire, late fire and allow
> >> >> > lateness
> >> >> > (which are internal experimental configurations in Group Window
> >> >> > Aggregation and not public to users yet.)
> >> >> > 5. Unification of the Window TVF Aggregation operator in runtime at
> >> the
> >> >> > implementation layer
> >> >> > (In the long term, the cost to maintain the operators about Window TVF
> >> >> > Aggregation and Group Window Aggregation is too expensive.)
> >> >> >
> >> >> >
> >> >> > This flip aims to continue the unfinished work in FLIP-145[2], which
> >> is
> >> >> to
> >> >> > fully enable the capabilities of Window TVF Aggregation
> >> >> >  and officially deprecate the legacy syntax Group Window Aggregation,
> >> to
> >> >> > prepare for the removal of the legacy one in Flink 2.0.
> >> >> >
> >> >> >
> >> >> > I have already done some preliminary POC to validate the feasibility
> >> of
> >> >> > the related work in this flip as follows.
> >> >> > 1. POC for SESSION Window TVF Aggregation [3]
> >> >> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
> >> >> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
> >> >> >
> >> >> >
> >> >> > Looking forward to your feedback and thoughts!
> >> >> >
> >> >> >
> >> >> >
> >> >> > [1]
> >> >> >
> >> >>
> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
> >> >> >
> >> >> > [2]
> >> >> >
> >> >>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> >> >> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
> >> >> > [4]
> >> >> >
> >> >>
> >> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
> >> >> > [5]
> >> >> >
> >> >>
> >> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> >     Best!
> >> >> >     Xuyang
> >> >>
> >>

Reply via email to