Sliding windows can be partitioned but this definition does not go into a GROUP BY clause but into a PARTITION BY clause added to OVER. GROUP BY returns one record per group which is not what you want. Check the syntax of the OVER clause, e.g., here [1] (see windowSpec at the end).
[1] http://calcite.apache.org/docs/reference.html 2016-09-27 13:20 GMT+02:00 Radu Tudoran <[email protected]>: > Hi, > > Thanks for the answer. > As a follow up question - is it possible to use a GROUP BY clause after > the previous query? > > SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS > 5 PRECEDING) > FROM Orders > GROUP BY order_type > > > I am asking this from the perspective of knowing if this would enable to > implement this over KeyedStream(s) such as the ones in Flink + the windows > > > > -----Original Message----- > From: Fabian Hueske [mailto:[email protected]] > Sent: Tuesday, September 27, 2016 11:32 AM > To: [email protected] > Subject: Re: New type of window semantics > > Hi Radu, > > sliding windows as described by Julian will emit exactly one row for each > incoming row. > In the scenario you describe only one row will be emitted when ordN6 > arrives (otherwise, each input row would result in five emitted rows). > > So sliding windows seem to be what you are looking for. > > Best, Fabian > > 2016-09-27 10:59 GMT+02:00 Radu Tudoran <[email protected]>: > > > Hi, > > > > Thanks for this points. > > I am not sure if I really understood the implications of using this > > option in the stream mode. I got the point that if we have 20 rows > > then we have 20 outputs. However, I wonder what happens when a new > > record comes in and we have the query you proposed > > > > SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS > > 5 > > PRECEDING) > > FROM Orders > > > > > > Assuming we have up to moment T te following 5 orders: > > > > ordN1, ordN2, ordN3, ordN4, ordN5 > > > > and we get ordN6 at moment T+1 > > ..will the query provide only one result corresponding to ordN6 and > > thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because > > ordN2 to ordN5 are still in the system the query will return 5 results? > > > > > > If the query answer is 1 output in this case corersponding to element > > ordN6 then indeed it can do the job for this scenario. > > > > > > > > > > > > -----Original Message----- > > From: Julian Hyde [mailto:[email protected]] > > Sent: Tuesday, September 27, 2016 2:41 AM > > To: [email protected] > > Subject: Re: New type of window semantics > > > > Have you considered the sliding window, which is already part of > > standard SQL? We propose to support it in streaming SQL also. Here is > an example: > > > > SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5 > > PRECEDING) > > FROM Orders > > > > (This is a non-streaming query, but you can add the STREAM keyword to > > get a streaming query.) > > > > Given orders 1 .. 20, then order 10 would show the average for orders 5 > .. > > 10 inclusive, order 11 would show the average for orders 6 .. 11, and > > so forth. > > > > In streaming queries, windows are often used in the GROUP BY clause, > > but we do not use a GROUP BY here. The OVER clause with sliding > > windows does not aggregate rows. If 20 rows come in, then 20 rows go > > out. It makes sense, because each row cannot have its own window if > > multiple rows are squashed into one. > > > > Julian > > > > > > > > > On Sep 26, 2016, at 12:53 AM, Radu Tudoran <[email protected]> > > wrote: > > > > > > Hi, > > > > > > First of all let me introduce myself - My name is Radu Tudoran and I > > > am > > working in the field of Big Data processing with a high focus on > > streaming and more recently in the area of SQL. > > > > > > I wanted to raise a question/proposal for discussion in the community: > > > > > > Based on our requirements I realized that I would need to create a > > window (e.g. hop window) that would move on every incoming element based. > > The syntax that I have in mind for it is > > > > > > HOP(column_name, # EVENT , INTERVAL # ) (or should it rather be # > > ELEMENT instead of EVENT?) > > > > > > I wanted to check with you what do you think about such a grammar to > > > go > > directly in Calcite? I think it is relevant for streaming scenarios > > where you do not necessary have events coming at regular time interval > > but you would still like to react on every event. > > > As an example you can consider a stock market application where you > > would always compute for every new offer the average over the last hour. > > > > > > Best regards, > > > > >
