As Fabian says, the key thing is to decide whether you want one-record-per-group or one-record-per-input-record. Only use GROUP BY if you want the former behavior.
Slides 16 onwards in http://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912 <http://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912> have some diagrams that may help understand the various kinds of windows. Julian > On Sep 27, 2016, at 4:30 AM, Fabian Hueske <[email protected]> wrote: > > Sliding windows can be partitioned but this definition does not go into a > GROUP BY clause but into a PARTITION BY clause added to OVER. > GROUP BY returns one record per group which is not what you want. > Check the syntax of the OVER clause, e.g., here [1] (see windowSpec at the > end). > > [1] http://calcite.apache.org/docs/reference.html > > > > 2016-09-27 13:20 GMT+02:00 Radu Tudoran <[email protected]>: > >> Hi, >> >> Thanks for the answer. >> As a follow up question - is it possible to use a GROUP BY clause after >> the previous query? >> >> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS >> 5 PRECEDING) >> FROM Orders >> GROUP BY order_type >> >> >> I am asking this from the perspective of knowing if this would enable to >> implement this over KeyedStream(s) such as the ones in Flink + the windows >> >> >> >> -----Original Message----- >> From: Fabian Hueske [mailto:[email protected]] >> Sent: Tuesday, September 27, 2016 11:32 AM >> To: [email protected] >> Subject: Re: New type of window semantics >> >> Hi Radu, >> >> sliding windows as described by Julian will emit exactly one row for each >> incoming row. >> In the scenario you describe only one row will be emitted when ordN6 >> arrives (otherwise, each input row would result in five emitted rows). >> >> So sliding windows seem to be what you are looking for. >> >> Best, Fabian >> >> 2016-09-27 10:59 GMT+02:00 Radu Tudoran <[email protected]>: >> >>> Hi, >>> >>> Thanks for this points. >>> I am not sure if I really understood the implications of using this >>> option in the stream mode. I got the point that if we have 20 rows >>> then we have 20 outputs. However, I wonder what happens when a new >>> record comes in and we have the query you proposed >>> >>> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS >>> 5 >>> PRECEDING) >>> FROM Orders >>> >>> >>> Assuming we have up to moment T te following 5 orders: >>> >>> ordN1, ordN2, ordN3, ordN4, ordN5 >>> >>> and we get ordN6 at moment T+1 >>> ..will the query provide only one result corresponding to ordN6 and >>> thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because >>> ordN2 to ordN5 are still in the system the query will return 5 results? >>> >>> >>> If the query answer is 1 output in this case corersponding to element >>> ordN6 then indeed it can do the job for this scenario. >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: Julian Hyde [mailto:[email protected]] >>> Sent: Tuesday, September 27, 2016 2:41 AM >>> To: [email protected] >>> Subject: Re: New type of window semantics >>> >>> Have you considered the sliding window, which is already part of >>> standard SQL? We propose to support it in streaming SQL also. Here is >> an example: >>> >>> SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5 >>> PRECEDING) >>> FROM Orders >>> >>> (This is a non-streaming query, but you can add the STREAM keyword to >>> get a streaming query.) >>> >>> Given orders 1 .. 20, then order 10 would show the average for orders 5 >> .. >>> 10 inclusive, order 11 would show the average for orders 6 .. 11, and >>> so forth. >>> >>> In streaming queries, windows are often used in the GROUP BY clause, >>> but we do not use a GROUP BY here. The OVER clause with sliding >>> windows does not aggregate rows. If 20 rows come in, then 20 rows go >>> out. It makes sense, because each row cannot have its own window if >>> multiple rows are squashed into one. >>> >>> Julian >>> >>> >>> >>>> On Sep 26, 2016, at 12:53 AM, Radu Tudoran <[email protected]> >>> wrote: >>>> >>>> Hi, >>>> >>>> First of all let me introduce myself - My name is Radu Tudoran and I >>>> am >>> working in the field of Big Data processing with a high focus on >>> streaming and more recently in the area of SQL. >>>> >>>> I wanted to raise a question/proposal for discussion in the community: >>>> >>>> Based on our requirements I realized that I would need to create a >>> window (e.g. hop window) that would move on every incoming element based. >>> The syntax that I have in mind for it is >>>> >>>> HOP(column_name, # EVENT , INTERVAL # ) (or should it rather be # >>> ELEMENT instead of EVENT?) >>>> >>>> I wanted to check with you what do you think about such a grammar to >>>> go >>> directly in Calcite? I think it is relevant for streaming scenarios >>> where you do not necessary have events coming at regular time interval >>> but you would still like to react on every event. >>>> As an example you can consider a stock market application where you >>> would always compute for every new offer the average over the last hour. >>>> >>>> Best regards, >>> >>> >>
