Re: New type of window semantics

Julian Hyde Tue, 27 Sep 2016 14:03:28 -0700

As Fabian says, the key thing is to decide whether you want 
one-record-per-group or one-record-per-input-record. Only use GROUP BY if you 
want the former behavior.


Slides 16 onwards in 
http://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912
 
<http://www.slideshare.net/julianhyde/streaming-sql-at-flinkforward-berlin-20160912>
 have some diagrams that may help understand the various kinds of windows.

Julian


> On Sep 27, 2016, at 4:30 AM, Fabian Hueske <[email protected]> wrote:
> 
> Sliding windows can be partitioned but this definition does not go into a
> GROUP BY clause but into a PARTITION BY clause added to OVER.
> GROUP BY returns one record per group which is not what you want.
> Check the syntax of the OVER clause, e.g., here [1] (see windowSpec at the
> end).
> 
> [1] http://calcite.apache.org/docs/reference.html
> 
> 
> 
> 2016-09-27 13:20 GMT+02:00 Radu Tudoran <[email protected]>:
> 
>> Hi,
>> 
>> Thanks for the answer.
>> As a follow up question - is it possible to use a GROUP BY clause after
>> the previous query?
>> 
>> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
>> 5 PRECEDING)
>>   FROM Orders
>>   GROUP BY order_type
>> 
>> 
>> I am asking this from the perspective of knowing if this would enable to
>> implement this over KeyedStream(s) such as the ones in Flink + the windows
>> 
>> 
>> 
>> -----Original Message-----
>> From: Fabian Hueske [mailto:[email protected]]
>> Sent: Tuesday, September 27, 2016 11:32 AM
>> To: [email protected]
>> Subject: Re: New type of window semantics
>> 
>> Hi Radu,
>> 
>> sliding windows as described by Julian will emit exactly one row for each
>> incoming row.
>> In the scenario you describe only one row will be emitted when ordN6
>> arrives (otherwise, each input row would result in five emitted rows).
>> 
>> So sliding windows seem to be what you are looking for.
>> 
>> Best, Fabian
>> 
>> 2016-09-27 10:59 GMT+02:00 Radu Tudoran <[email protected]>:
>> 
>>> Hi,
>>> 
>>> Thanks for this points.
>>> I am not sure if I really understood the implications of using this
>>> option in the stream mode. I got the point that if we have 20 rows
>>> then we have 20 outputs. However, I wonder what happens when a new
>>> record comes in and we have the query you proposed
>>> 
>>> SELECT SREAM orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS
>>> 5
>>> PRECEDING)
>>>  FROM Orders
>>> 
>>> 
>>> Assuming we have up to moment T te following 5 orders:
>>> 
>>> ordN1, ordN2, ordN3, ordN4, ordN5
>>> 
>>> and we get ordN6 at moment T+1
>>> ..will the query provide only one result corresponding to ordN6 and
>>> thus average over ordN2, ordN3, ordN4, ordN5, ordN6....or because
>>> ordN2 to ordN5 are still in the system the query will return 5 results?
>>> 
>>> 
>>> If the query answer is 1 output in this case corersponding to element
>>> ordN6 then indeed it can do the job for this scenario.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Julian Hyde [mailto:[email protected]]
>>> Sent: Tuesday, September 27, 2016 2:41 AM
>>> To: [email protected]
>>> Subject: Re: New type of window semantics
>>> 
>>> Have you considered the sliding window, which is already part of
>>> standard SQL?  We propose to support it in streaming SQL also. Here is
>> an example:
>>> 
>>>  SELECT orderId, price, AVG(price) OVER (ORDER BY orderTime ROWS 5
>>> PRECEDING)
>>>  FROM Orders
>>> 
>>> (This is a non-streaming query, but you can add the STREAM keyword to
>>> get a streaming query.)
>>> 
>>> Given orders 1 .. 20, then order 10 would show the average for orders 5
>> ..
>>> 10 inclusive, order 11 would show the average for orders 6 .. 11, and
>>> so forth.
>>> 
>>> In streaming queries, windows are often used in the GROUP BY clause,
>>> but we do not use a GROUP BY here. The OVER clause with sliding
>>> windows does not aggregate rows. If 20 rows come in, then 20 rows go
>>> out. It makes sense, because each row cannot have its own window if
>>> multiple rows are squashed into one.
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>>> On Sep 26, 2016, at 12:53 AM, Radu Tudoran <[email protected]>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> First of all let me introduce myself - My name is Radu Tudoran and I
>>>> am
>>> working in the field of Big Data processing with a high focus on
>>> streaming and more recently in the area of SQL.
>>>> 
>>>> I wanted to raise a question/proposal for discussion in the community:
>>>> 
>>>> Based on our requirements I realized that I would need to create a
>>> window (e.g. hop window) that would move on every incoming element based.
>>> The syntax that I have in mind for it is
>>>> 
>>>> HOP(column_name, # EVENT , INTERVAL # )   (or should it rather be #
>>> ELEMENT instead of EVENT?)
>>>> 
>>>> I wanted to check with you what do you think about such a grammar to
>>>> go
>>> directly in Calcite? I think it is relevant for streaming scenarios
>>> where you do not necessary have events coming at regular time interval
>>> but you would still like to react on every event.
>>>> As an example you can consider a stock market application where you
>>> would always compute for every new offer the average over the last hour.
>>>> 
>>>> Best regards,
>>> 
>>> 
>>

Re: New type of window semantics

Reply via email to