Hi, Shengkai.

> I think we shouldn't remove the operator if we can not give a solution to 
> help users upgrade their jobs. But I think we can delay the discussion until 
> we need to remove the operator. 


+1 for it.







--

    Best!
    Xuyang




在 2023-12-08 19:22:40,"Shengkai Fang" <fskm...@gmail.com> 写道:

Hi, Xuyang. Thanks for your response.


I just thought an another way to solve this question instead of introducing a 
new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx), f0`, 
the rewritten sql can be GROUP BY f0, window_start, window_end(window_start and 
window_end is produced by WINDOW TVF). We can use field index here and use a 
new Calc node to alias them to avoid field name conflict in WindowAggregate 
node.


The solution is much better than before and I think it can solve the problem I 
mentioned before. 


What about using config named "table.optimizer.window-rewrite-enabled"


+1  


IIRC, currently compatibility across middle versions of SQL is not guaranteed. 
Should we add constraints on this part?


I think we shouldn't remove the operator if we can not give a solution to help 
users upgrade their jobs. But I think we can delay the discussion until we need 
to remove the operator. 


Best,
Shengkai


Xuyang <xyzhong...@163.com> 于2023年12月8日周五 18:07写道:

Hi, Martijn, thanks for your share.


>On the topic of syntax for early/late fires, there is existing
>configuration for the legacy group windows:
>
>SET table.exec.emit.early-fire.enabled = true;
>SET table.exec.emit.early-fire.delay = 5s;
>SET table.exec.emit.late-fire.enabled = true;
>SET table.exec.emit.late-fire.delay = 0;
>SET table.exec.emit.allow-lateness = 5s;
>
>We should stick with the syntax for the TVFs, and not modify that.
Agree with you. We should follow the syntax defined in Flip-145. As for how to 
let these options to only take effect on a single window agg instead of all 
window aggs, we need to think of another way.
>On the topic of column naming, for other situations where a user wants
>to use a value that's already reserved, we require the user to include
>backticks to indicate that Flink should not use the reserved keyword
>implementation. Why isn't that sufficient in this case? I rather stay
>consistent with this behaviour instead of introducing new config
>options.
>
I just thought an another way to solve this question instead of introducing a 
new configuration. When using legacy syntax like `GROUP BY TUMBLE(xxx), f0`, 
the rewritten sql can be GROUP BY f0, window_start, window_end(window_start and 
window_end is produced by WINDOW TVF). We can use field index here and use a 
new Calc node to alias them to avoid field name conflict in WindowAggregate 
node.
What about this idea, cc  @Shengkai Fang ?


>I would propose to rename the FLIP to
>something like "Add Missing Table Valued Functions Features to Replace
>Legacy Group Window Aggregation".
IMO, the original title "deprecate" maybe more clear. Because although the doc 
has deprecated the legacy Group Window Aggregation syntax just like what I said 
in this Flip, but the DEPRECATED tag in doc is attached when doing the 
Flip-145. Let's review the content in Flip-145 again. The Flip-145 said "The 
existing Grouped window functions, i.e. GROUP BY TUMBLE... are still supported, 
but will be deprecated". So as a work to follow Flip-145, this Flip officially 
deprecates the legacy window syntax at this time. WDTK?



--

    Best!
    Xuyang





At 2023-12-07 16:51:44, "Martijn Visser" <martijnvis...@apache.org> wrote:
>Hi Xuyang,
>
>Thanks a lot for starting this discussion.
>
>At first, I was a bit confused because the FLIP talks about
>deprecating the Legacy Group Window Aggregations, but they have
>already been marked as deprecated in the documentation [1].
>My understanding was that the big challenge was that we don't yet
>support SESSION windows in the Window TVF, and that the other features
>you've mentioned in the discussion threads are additional
>capabilities.
>
>However, when reading up on the actual FLIP (your discussion email
>didn't include a link [2] to it) I now understand the situation. The
>appendix table is actually the most valuable for me, because it gives
>me the overview of the missing capabilities between TVF implementation
>and the Legacy Group Window Aggregations.
>
>On the topic of syntax for early/late fires, there is existing
>configuration for the legacy group windows:
>
>SET table.exec.emit.early-fire.enabled = true;
>SET table.exec.emit.early-fire.delay = 5s;
>SET table.exec.emit.late-fire.enabled = true;
>SET table.exec.emit.late-fire.delay = 0;
>SET table.exec.emit.allow-lateness = 5s;
>
>We should stick with the syntax for the TVFs, and not modify that.
>
>On the topic of column naming, for other situations where a user wants
>to use a value that's already reserved, we require the user to include
>backticks to indicate that Flink should not use the reserved keyword
>implementation. Why isn't that sufficient in this case? I rather stay
>consistent with this behaviour instead of introducing new config
>options.
>
>Overall, +1 on this topic. I would propose to rename the FLIP to
>something like "Add Missing Table Valued Functions Features to Replace
>Legacy Group Window Aggregation".
>
>Best regards,
>
>Martijn
>
>[1] 
>https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/window-agg/#group-window-aggregation
>[2] 
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-392%3A+Deprecate+the+Legacy+Group+Window+Aggregation
>
>
>On Wed, Dec 6, 2023 at 9:32 AM Xuyang <xyzhong...@163.com> wrote:
>>
>> Hi, Shengkai. Thanks to share your thought. Let me answer related questions。
>>
>>
>> > Could you give an example about the pass-through column. A session window 
>> > may contain multiple rows, which value is selected by the windowoperator?
>>
>>
>> The table function make the entire inpyt row available in the output. Take 
>> an example in Flink, if the input row is <a,b,c> with schema <f0,f1,f2>,
>> the output of the window tvf can also access the <a,b,c> with schema 
>> <f0,f1,f2>. The session window always output these multi rows with attached 
>> window_start,
>> window_end and window_time column.
>>
>>
>> > What's the behavior if all the data in the window have been removed?
>>
>>
>> Align the behavior of existing group window agg, and do not output data when 
>> the window is empty.
>>
>>
>> > Could you explain more details about the how window process CDC? For 
>> > example, what's the behavior if the window only gets the DELETE data from 
>> > the upstream Operator.
>>
>>
>> Also align the behavior of existing group window agg. However, there is a 
>> bug in group window agg that currently group window agg has different 
>> results when only
>> consuming -D records while using or not using minibatch. Refs more at 
>> FLINK-33760.
>>
>>
>> > The subtitle is not correct here.
>>
>>
>> Updated the doc to fix it.
>>
>>
>> > It's better if we can introduce a syntax like the `emit` keyword to set 
>> > the emit strategy for every window.
>> I agree with you. But I don't recommend using this syntax SESSION(data 
>> [PARTITION BY (keycols, ...)], DESCRIPTOR(timecol), gap, emit='') because it 
>> breaks the syntax in Flip-145.
>> I think using query hint is a better idea. Anyway, this work should belong 
>> to another flip.
>>
>>
>> > I think more work should be mentioned in the FLIP. What's the behavior if 
>> > the input schema contains a column named `window_start`?
>> > In the current design, `window_start` is a reserved keyword in the window 
>> > TVF syntax, but it is legal in the legacy implementation.
>>
>>
>> This is a good question. Perhaps we can introduce a config to add a specific 
>> config (named "table.window-additional-columns.prefix") to add a prefix
>> of all window addition columns to solve this situation. For example, user 
>> can set the conf to "$", and the additional column from window will become
>> "$window_start", "$window_end" and "$window_time". WDYT?
>>
>>
>> > In the FLIP, you mention the FLIP should introduce an option to fall back 
>> > to the legacy behavior. Could you tell us what's the name of the option?
>> > BTW, I think we should unify the implementation when window TVF can do all 
>> > the work that the legacy operator can do and there is no need to introduce 
>> > an option to fallback.
>>
>>
>> What about using config named "table.optimizer.window-rewrite-enabled"? If
>> I agree with you that util all features are aligned and everything is ok 
>> about window tvf, we should also remove this config about fallback.
>> But I think to be on the safe side, we can observe one or two versions of 
>> this rewrite, and allows users to roll back when problems arise.
>>
>>
>> > If we remove the legacy window operator in the future, how users upgrade 
>> > their jobs? Do you have any plan to support state migration from the 
>> > legacy window to Windows TVF?
>> IIRC, currently compatibility across middle versions of SQL is not 
>> guaranteed. Should we add constraints on this part?
>>
>>
>> Look for your feedback!
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>     Best!
>>     Xuyang
>>
>>
>>
>>
>>
>> At 2023-12-05 12:06:27, "liu ron" <ron9....@gmail.com> wrote:
>> >Hi, xuyang
>> >
>> >Thanks for starting this FLIP discussion, currently there are two types of
>> >window aggregation in Flink SQL, namely legacy group window aggregation and
>> >window tvf aggregation, these two types of window aggregation are not fully
>> >aligned in behavior, which will bring a lot of confusion to the users, so
>> >there is a need to unify and align them. I think the final ideal state
>> >should be that there is only one window tvf aggregation, which supports
>> >Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data
>> >streams. There is also support for configuring EARLY-FIRE and LATER-FIRE.
>> >
>> >This FLIP is a continuation of FLIP-145, and also supports legacy group
>> >window aggregation to flat-migrate to the new window tvf agregation, which
>> >is very useful, especially for the support of CDC streams, a pain point
>> >that users often feedback. Big +1 for this FLIP.
>> >
>> >Best,
>> >Ron
>> >
>> >Xuyang <xyzhong...@163.com> 于2023年12月5日周二 11:11写道:
>> >
>> >> Hi, Feng and David.
>> >>
>> >>
>> >> Thank you very much to share your thoughts.
>> >>
>> >>
>> >> This flip does not include the official exposure of these experimental
>> >> conf to users. Thus there is not adetailed description of this part.
>> >> However, in view that some technical users may have added these
>> >> experimental conf in actual production jobs, the processing
>> >> of these conf while using window tvf syntax has been added to this flip.
>> >>
>> >>
>> >> Overall, the behavior of using these experimental parameters is no
>> >> different from before, and I think we should provide the compatibility
>> >> about using these experimental conf.
>> >>
>> >>
>> >> Look for your thoughs.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >>     Best!
>> >>     Xuyang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2023-12-05 09:17:49, "David Anderson" <dander...@apache.org> wrote:
>> >> >The current situation (where we have both the legacy windows and the
>> >> >TVF-based windows) is confusing for users, and I'd like to see us move
>> >> >forward as rapidly as possible.
>> >> >
>> >> >Since the early fire, late fire, and allowed lateness features were never
>> >> >documented or exposed to users, I don't feel that we need to provide
>> >> >replacements for these internal, experimental features before officially
>> >> >deprecating the legacy group window aggregations, and I'd rather not 
>> >> >wait.
>> >> >
>> >> >However, I'd be delighted to see a proposal for what that might look 
>> >> >like.
>> >> >
>> >> >Best,
>> >> >David
>> >> >
>> >> >On Mon, Dec 4, 2023 at 12:45 PM Feng Jin <jinfeng1...@gmail.com> wrote:
>> >> >
>> >> >> Hi xuyang,
>> >> >>
>> >> >> Thank you for initiating this proposal.
>> >> >>
>> >> >> I'm glad to see that TVF's functionality can be fully supported.
>> >> >>
>> >> >> Regarding the early fire, late fire, and allow lateness features, how
>> >> will
>> >> >> they be provided to users? The documentation doesn't seem to provide a
>> >> >> detailed description of this part.
>> >> >>
>> >> >> Since this FLIP will also involve a lot of feature development, I am
>> >> more
>> >> >> than willing to help, including development and code review.
>> >> >>
>> >> >> Best,
>> >> >> Feng
>> >> >>
>> >> >> On Tue, Nov 28, 2023 at 8:31 PM Xuyang <xyzhong...@163.com> wrote:
>> >> >>
>> >> >> > Hi all.
>> >> >> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy 
>> >> >> > Group
>> >> >> > Window Aggregation.
>> >> >> >
>> >> >> >
>> >> >> > Although the current Flink SQL Window Aggregation documentation[1]
>> >> >> > indicates that the legacy Group Window Aggregation
>> >> >> > syntax has been deprecated, the new Window TVF Aggregation syntax has
>> >> not
>> >> >> > fully covered all of the features of the legacy one.
>> >> >> >
>> >> >> >
>> >> >> > Compared to Group Window Aggergation, Window TVF Aggergation has
>> >> several
>> >> >> > advantages, such as two-stage optimization,
>> >> >> > support for standard GROUPING SET syntax, and so on. However, it
>> >> needs to
>> >> >> > supplement and enrich the following features.
>> >> >> >
>> >> >> >
>> >> >> > 1. Support for SESSION Window TVF Aggregation
>> >> >> > 2. Support for consuming CDC stream
>> >> >> > 3. Support for HOP window size with non-integer step length
>> >> >> > 4. Support for configurations such as early fire, late fire and allow
>> >> >> > lateness
>> >> >> > (which are internal experimental configurations in Group Window
>> >> >> > Aggregation and not public to users yet.)
>> >> >> > 5. Unification of the Window TVF Aggregation operator in runtime at
>> >> the
>> >> >> > implementation layer
>> >> >> > (In the long term, the cost to maintain the operators about Window 
>> >> >> > TVF
>> >> >> > Aggregation and Group Window Aggregation is too expensive.)
>> >> >> >
>> >> >> >
>> >> >> > This flip aims to continue the unfinished work in FLIP-145[2], which
>> >> is
>> >> >> to
>> >> >> > fully enable the capabilities of Window TVF Aggregation
>> >> >> >  and officially deprecate the legacy syntax Group Window Aggregation,
>> >> to
>> >> >> > prepare for the removal of the legacy one in Flink 2.0.
>> >> >> >
>> >> >> >
>> >> >> > I have already done some preliminary POC to validate the feasibility
>> >> of
>> >> >> > the related work in this flip as follows.
>> >> >> > 1. POC for SESSION Window TVF Aggregation [3]
>> >> >> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
>> >> >> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
>> >> >> >
>> >> >> >
>> >> >> > Looking forward to your feedback and thoughts!
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > [1]
>> >> >> >
>> >> >>
>> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
>> >> >> >
>> >> >> > [2]
>> >> >> >
>> >> >>
>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
>> >> >> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
>> >> >> > [4]
>> >> >> >
>> >> >>
>> >> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
>> >> >> > [5]
>> >> >> >
>> >> >>
>> >> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> >     Best!
>> >> >> >     Xuyang
>> >> >>
>> >>

Reply via email to