Thanks for driving this Jark ~ The FLIP overall looks good, i think the window TVFs would be the main "entry point" syntax for our NTR use cases. The syntax originated from the "One SQL To Rule Them All" paper and i think we have reached an agreement.
I want to make some additions to the window TVF syntax here - We support the standard syntax of polymorphic table functions with named parameters. e.g. select * from table( tumble( DATA => table Shipments, TIMECOL => descriptor(rowtime), SIZE => INTERVAL '1' MINUTE)) - The first parameter can also be any form of sub-query, e.g. select * from table( tumble((select * from Shipments), descriptor(rowtime), INTERVAL '1' MINUTE)) The current proposed syntax is a simplified one and for most of the cases it is more easy to use. The question asked by Benchao is reasonable, as we suggest the window TVF as a normal UDTF, it can be chained with any kind of other non-windowed operators, we may need some time to give clear semantics for them (especially it is good if there are real use cases), as a start, let us focus the windowed operations first. I'm also +1 for voting ~ Best, Danny Chan 在 2020年10月10日 +0800 PM2:03,Benchao Li <libenc...@apache.org>,写道: > Hi Jark, > > Thanks for your reply, this makes sense to me. > > The scenario I used above is just a case to explain what I'm concerned > about, > not necessarily a production use case. We can leave it to the future to see > whether > other users have these use cases. > > Then I have no other concerns, +1 to start the VOTE. > > > Jark Wu <imj...@gmail.com> 于2020年10月10日周六 下午1:44写道: > > > Hi Benchao, > > > > That's a good question. > > > > IMO, the new windowed operators and the current time operators are two > > different sets of functions, > > just like time operators and non-time operators are two different sets of > > functions. > > I think it's fine if we don't support integrating them, just like time > > operators can't be applied on non-windowed aggregate. > > If users want to use time operators in the whole pipeline, then he/she can > > use the grouped window aggregates instead of the window TVFs. > > > > The key idea of window TVF is that all the operators in the pipeline are > > based on the **windows**. > > In terms of syntax, if the key clause (e.g. group by, partitioned by, join > > on, order by) contains window_start and window_end, > > it can be translated into windowed operators. > > Thus, we will have windowed CEP, windowed sort, windowed over aggregate in > > the future to make it possible to build a windowed pipeline. > > > > But I think we can elaborate the integration more in the future if users > > need it. Actually, I don't fully understand the scenario of integrating > > window TVF and time operators at this point. > > For example, interval join an input stream and a window join result. I > > don't see why it can't be expressed by nested window join and why users > > have to use interval join here. > > Maybe we can wait for more inputs from users when the window TVF is > > released and we can elaborate it again. > > > > Best, > > Jark > > > > On Sat, 10 Oct 2020 at 12:01, 刘 芃成 <pengchengliucr...@gmail.com> wrote: > > > > > Hi, Benchao, > > > I think I got your point, actually, in current implementation for > > > group window aggregation, the value of time attributes(e.g. > > > TUMBLE_ROWTIME/TUMBLE_PROCTIME) is calculated as (window_end – 1), so I > > > think we can just use it directly if you need this. But I think this time > > > attributes is mainly suggested to use in case of cascaded window > > > operations. > > > Regarding the example you provided, I think the semantics of the SQL in > > > your example which doing interval join(e.g. with TUMBLE_ROWTIME) after > > > window aggregation is not clear in the current implementation, and I think > > > that’s a strong reason why we need the new TVFs syntax. > > > With the new syntax, users should understand which time column to > > > use and how to generate it when doing interval join and etc. > > > > > > Best, > > > Pengcheng > > > > > > 发件人: Benchao Li <libenc...@apache.org> > > > 日期: 2020年10月10日 星期六 上午11:02 > > > 收件人: pengcheng Liu <pengchengliucr...@gmail.com> > > > 抄送: dev <dev@flink.apache.org> > > > 主题: Re: [DISCUSS] FLIP-145: Support SQL windowing table-valued function > > > > > > Hi pengcheng, > > > > > > Thanks for your response. > > > I knew that the original time attribute column will be retained after the > > > TVF, > > > what I'm questioning is how do we get the time attribute column after > > > Aggregation. > > > Your answer did not remove my doubts about this. > > > > > > It's ok if we did not plan to integrate new TVF aggregate with old "time > > > attribute scenarios" > > > listed in my previous email in this FLIP. However it's good to elaborate > > > more on this, and > > > leave it to the future plan. > > > > > > pengcheng Liu <pengchengliucr...@gmail.com<mailto: > > > pengchengliucr...@gmail.com>> 于2020年10月10日周六 上午10:45写道: > > > Hi,Benchao, > > > In TVFs, the time attributes is just passed through from parent rels, > > > and the TVFs just add two > > > additional window attributes(i.e. window_start & window_end). Also, I > > > think the time columns can be not only a time attribute > > > with type of `TimeIndicatorType` but also a regular column with type > > > of `Timestamp`. > > > > > > For cascaded window operations, we can use window_start/window_end of > > > the previous window result directly to > > > indicate operating on the same window, or use new DESCRIPTOR column > > > to assign new windows, in case of the change of > > > the time column(e.g. in some case, the original timestamp is > > > inaccurate and need some conversion to be used). > > > > > > You can check the definition or signature of these TVFs in the FLIP. > > > e.g. > > > SELECT * FROM TABLE( > > > TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES)) > > > In the example, the `bidtime` is the time attribute column, which is > > > the first operand of the DESCRIPTOR function. > > > > > > +1 start voting. > > > > > > Benchao Li <libenc...@apache.org<mailto:libenc...@apache.org>> > > > 于2020年10月10日周六 上午10:08写道: > > > Hi Jark, > > > > > > 2 & 3 sounds good to me. > > > > > > Regarding time attribute, > > > I still have some questions, I knew it's easy to support cascaded window > > > aggregate using new TVFs. > > > However there are some other places where need time attribute: > > > - CEP > > > - interval join > > > - order by > > > - over window > > > If there is no time attribute column, how do we integrate these old > > > features with the new TVFs. > > > E.g. > > > StreamA -> new window aggregate -> interval join -> Sink > > > / > > > StreamB ----------------------------------- > > > > > > > > > Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>> 于2020年10月9日周五 > > > 下午11:51写道: > > > Hi Benchao, > > > > > > 1) time attribute > > > Yes. We don't need time attribute auxiliary function. Because the new > > > window operations are all based on the > > > window_start and window_end columns instead of on the time attributes. So > > > we don't need to propagate time attributes. > > > Cascaded window aggregate can be expressed by simply GROUP BY the > > > window_start and window_end of the previous window result. > > > I have added a cascaded window aggregate example in the Tumbling Window > > > section in the FLIP. > > > If you want to define proctime window aggregate, the time column in TVF > > > should be a proctime attribute field (or PROCTIME() function). > > > > > > 2) batch support > > > Yes. The proposed syntax/API are unified for batch and streaming. Batch > > > support is in the plan, but may not have enough time to catch up 1.12. > > > > > > 3) support `grouping sets` > > > This is not included in the FLIP, but I think it's great if we can support > > > `grouping sets`. > > > The existing window impl doesn't support this because we convert the > > > LogicalAggregate into WindowAggregate in the beginning, > > > the expand grouping sets rule can't be applied in this situation. > > > Fortunately, with the new window impl, the conversion to WindowAggregate > > > will happen at the end, so I think the expand rule can be > > > applied and support this feature naturally. > > > Therefore, IMO, we don't need to include this feature in this FLIP to > > > avoid > > > the FLIP being too large. > > > This can be a follow-up issue (maybe just add tests and docs) after the > > > FLIP. > > > > > > Best, > > > Jark > > > > > > > > > On Fri, 9 Oct 2020 at 19:09, 刘 芃成 <pengchengliucr...@gmail.com<mailto: > > > pengchengliucr...@gmail.com>> wrote: > > > > > > > Hi,Benchao, > > > > Welcome to join the discussion, yes, this new syntax can make > > > SQL > > > > more clear and simpler. > > > > For your first question, the `window_start` and `window_end` > > > > columns will be added automatically, > > > > so we don't need to use auxiliary group functions to infer or > > > > access the window properties. > > > > > > > > For the `grouping sets` on TVFs, I think it's interesting if we > > > > can support it, as we already supported `grouping sets` > > > > on streaming aggregates in blink planner. But I'm not sure if it > > > > will be included into this FLIP. > > > > > > > > cc @Jark Wu > > > > > > > > Best, > > > > Pengcheng > > > > > > > > > > > > 在 2020/10/9 下午5:25,“Benchao Li”<libenc...@apache.org<mailto: > > > libenc...@apache.org>> 写入: > > > > > > > > Thanks Jark for bringing this discussion, I like this FLIP very > > > much. > > > > > > > > Especially the cumulate window, it's much like the current TUMBLE > > > > window + > > > > Fast Emit (which is an undocumented experimental feature), however, > > > > it's > > > > more powerful. > > > > > > > > And This will make the SQL semantic more standard, especially for > > > the > > > > HOPPING window. > > > > > > > > Regarding time attribute, > > > > It seems that we don't need a specific function to infer the time > > > > attribute > > > > like > > > > `TUMBLE_ROWTIME` / `TUMBLE_PROCTIME`. Then are `window_start` and > > > > `window_end` > > > > column a time attribute column automatically? > > > > - If not, what will be the time attribute of the result relation of > > > > these > > > > TVFs? > > > > Especially after the window aggregation. > > > > - If yes, then how do we handle proctime? > > > > > > > > Regarding batch operators, > > > > It's great to hear that we can reuse the batch operators in > > > continuous > > > > batch mode > > > > as you mentioned in the FLIP. > > > > Current window aggregate could also be used in batch mode with > > > > rowtime. Do > > > > you plan > > > > to support these TVFs for batch mode in this FLIP? Hence the > > > Table/SQL > > > > is a > > > > unified > > > > API, it's great if we can keep the features complete both in > > > streaming > > > > and > > > > batch mode. > > > > > > > > There is one more question, I don't know whether it should be > > > > considered in > > > > this FLIP. > > > > Does the new window support `grouping sets`? (It's not supported in > > > old > > > > window impl). > > > > > > > > Jark Wu <imj...@gmail.com<mailto:imj...@gmail.com>> 于2020年10月9日周五 > > > 下午4:14写道: > > > > > > > > > Hi all, > > > > > > > > > > I know we have a lot of discussion and development on going right > > > > now but > > > > > it would be great if we can get FLIP-145 into a votable state. > > > > > If there are no objections, I would like to start voting in the > > > next > > > > days. > > > > > > > > > > Best, > > > > > Jark > > > > > > > > > > On Thu, 1 Oct 2020 at 14:29, Jark Wu <imj...@gmail.com<mailto: > > > imj...@gmail.com>> wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > I have added a section for Performance Optimization to describe > > > > how to > > > > > > improve the performance in the short-term and long-term > > > > > > and sketch the future performance potential under the new window > > > > API. > > > > > > Introducing the window API is just the first step, we will > > > > > > continuously improve the performance to make it powerful and > > > > useful. > > > > > > > > > > > > Best, > > > > > > Jark > > > > > > > > > > > > On Thu, 1 Oct 2020 at 14:28, Jark Wu <imj...@gmail.com<mailto: > > > imj...@gmail.com>> wrote: > > > > > > > > > > > > > Hi Pengcheng, > > > > > > > > > > > > > > Yes, the window TVF is part of the FLIP. Welcome to contribute > > > > and join > > > > > > > the discussion. > > > > > > > Regarding the SESSION window aggregation, users can use the > > > > existing > > > > > > > grouped session window function. > > > > > > > > > > > > > > Best, > > > > > > > Jark > > > > > > > > > > > > > > On Sun, 27 Sep 2020 at 21:24, liupengcheng < > > > > pengchengliucr...@gmail.com<mailto:pengchengliucr...@gmail.com> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Jark, > > > > > > > > Thanks for reply, yes, I think it's a good feature, it > > > > can > > > > > > > > improve the NRT scenarios > > > > > > > > as you mentioned in the FLIP. Also, I think it can > > > > improve the > > > > > > > > streaming SQL greatly, > > > > > > > > it can support richer window operations in flink SQL > > > and > > > > bring > > > > > > > > great convenience to users. > > > > > > > > (we are now only supported group window in flink). > > > > > > > > > > > > > > > > Regarding the SESSION window, I think it's especially > > > > useful > > > > > for > > > > > > > > user behavior analysis(e.g. > > > > > > > > counting user visits on a news website or social > > > > platform), but > > > > > > > > I agree that we can keep it > > > > > > > > out of the FLIP now to catch up 1.12. > > > > > > > > > > > > > > > > Recently, I've done some work on the stream planner > > > with > > > > the > > > > > > > > TVFs, and I'm willing to contribute > > > > > > > > to this part. Is it in the plan of this FLIP? > > > > > > > > > > > > > > > > Best, > > > > > > > > PengchengLiu > > > > > > > > > > > > > > > > > > > > > > > > 在 2020/9/26 下午11:09,“Jark Wu”<imj...@gmail.com<mailto: > > > imj...@gmail.com>> 写入: > > > > > > > > > > > > > > > > Hi pengcheng, > > > > > > > > > > > > > > > > That's great to see you also have the need of window join. > > > > > > > > You are right, the windowing TVF is a powerful feature > > > which > > > > can > > > > > > > > support > > > > > > > > more operations in the future. > > > > > > > > I think it as of the date time "partition" selection in > > > > batch SQL > > > > > > > > jobs, > > > > > > > > with this new syntax, I think it is possible > > > > > > > > to migrate traditional batch SQL jobs to Flink SQL by > > > > changing a > > > > > > > > few lines. > > > > > > > > > > > > > > > > Regarding the SESSION window, this is on purpose to keep > > > it > > > > out of > > > > > > > > the > > > > > > > > FLIP, because we want to keep the > > > > > > > > FLIP small to catch up 1.12 and SESSION TVF is rarely > > > useful > > > > (e.g. > > > > > > > > session > > > > > > > > window join?). > > > > > > > > > > > > > > > > Best, > > > > > > > > Jark > > > > > > > > > > > > > > > > On Fri, 25 Sep 2020 at 22:59, liupengcheng < > > > > > > > > pengchengliucr...@gmail.com<mailto: > > > pengchengliucr...@gmail.com>> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi, Jark, > > > > > > > > > I'm very interested in this feature, and I'm > > > also > > > > working > > > > > > > > on this > > > > > > > > > recently. > > > > > > > > > I just have a glance at the FLIP, it's good, > > > but I > > > > found > > > > > > > > that > > > > > > > > > there is no plan to add SESSION windows. > > > > > > > > > Also, I think there can be more things we can do > > > > based on > > > > > > > > this new > > > > > > > > > syntax. For example, > > > > > > > > > - window sort support > > > > > > > > > - window union/intersect/minus support > > > > > > > > > - Improve dimension table join > > > > > > > > > We can have more deep discussion on this new > > > > feature > > > > > later > > > > > > > > . > > > > > > > > > I've also opened an jira that is related to this > > > > feature > > > > > > > > recently: > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-18830 > > > > > > > > > > > > > > > > > > Best! > > > > > > > > > PengchengLiu > > > > > > > > > > > > > > > > > > 在 2020/9/25 下午10:30,“Jark Wu”<imj...@gmail.com<mailto: > > > imj...@gmail.com>> 写入: > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > I want to start a FLIP about supporting windowing > > > > > table-valued > > > > > > > > > functions > > > > > > > > > (TVF). > > > > > > > > > The main purpose of this FLIP is to improve the near > > > > > real-time > > > > > > > > (NRT) > > > > > > > > > experience of Flink. > > > > > > > > > > > > > > > > > > FLIP-145: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function > > > > > > > > > > > > > > > > > > We want to introduce TUMBLE, HOP, CUMULATE windowing > > > > TVFs, > > > > > the > > > > > > > > > CUMULATE is > > > > > > > > > a new kind of window. > > > > > > > > > With the windowing TVFs, we can support richer > > > > operations on > > > > > > > > windows, > > > > > > > > > including window join, window TopN and so on. > > > > > > > > > This makes things simple: we only need to assign > > > > windows at > > > > > the > > > > > > > > > beginning > > > > > > > > > of the query, and then apply operations after that > > > like > > > > > > > > traditional > > > > > > > > > batch > > > > > > > > > SQL. > > > > > > > > > We hope it can help to reduce the learning curve of > > > > windows, > > > > > > > > improve > > > > > > > > > NRT > > > > > > > > > for Flink, and attract more batch users. > > > > > > > > > > > > > > > > > > A simple code snippet for 10 minutes tumbling window > > > > > aggregate: > > > > > > > > > > > > > > > > > > SELECT window_start, window_end, SUM(price) > > > > > > > > > FROM TABLE( > > > > > > > > > TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL > > > > '10' > > > > > > > > MINUTES)) > > > > > > > > > GROUP BY window_start, window_end; > > > > > > > > > > > > > > > > > > I'm looking forward to your feedback. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best, > > > > Benchao Li > > > > > > > > > > > > > -- > > > > > > Best, > > > Benchao Li > > > > > > > > > -- > > > > > > Best, > > > Benchao Li > > > > > > > -- > > Best, > Benchao Li