[
https://issues.apache.org/jira/browse/FLINK-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868220#comment-15868220
]
Fabian Hueske commented on FLINK-5655:
--------------------------------------
Hi [~wheat9],
First of all, the term sliding window is a bit overloaded. What we call Sliding
(Group) Window in the DataStream API is not the same as a Sliding (Row) Window
(hence your examples (1) and (2) are not the semantically equivalent!) I think
the sliding row window semantics are more common, but now we have the term in
Flink coined differently and I don't think there is consensus to change that.
For example this document from the Calcite community calls what Flink calls
"Sliding Windows" "Hopping Windows": http://calcite.apache.org/docs/stream.html
Sorry for the confusion.
It is possible to define sliding group windows (as described in FLIP-11) in
SQL, however, it is a bit cumbersome.
For instance a sliding window of size 5 minutes that slides every minute could
be defined as
{code}
SELECT SUM(b) OVER (PARTITION BY a ORDER BY rowtime ROWS BETWEEN 5 PRECEDING
AND CURRENT ROW)
FROM (
SELECT a, SUM(b) AS b, MAX(rowtime) AS rowtime
FROM tab
GROUP BY a, FLOOR(rowtime TO MINUTE)
)
{code}
This query basically first computes partial aggregates using a tumbling window
and then the final aggregates using a row window based on row counts.
However, there are a few issues with that.
- we do not want to support event-time OVER ROW windows because they might
cause very expensive updates for late data.
- this is very hard to translate to Flink's built-in windows (or the Table API
windows) because the logic is distributed across several operators.
Hope this helps, Fabian
> Add event time OVER RANGE BETWEEN x PRECEDING aggregation to SQL
> ----------------------------------------------------------------
>
> Key: FLINK-5655
> URL: https://issues.apache.org/jira/browse/FLINK-5655
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: Fabian Hueske
> Assignee: Shaoxuan Wang
>
> The goal of this issue is to add support for OVER RANGE aggregations on event
> time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT
> a,
> SUM(b) OVER (PARTITION BY c ORDER BY rowTime() RANGE BETWEEN INTERVAL '1'
> HOUR PRECEDING AND CURRENT ROW) AS sumB,
> MIN(b) OVER (PARTITION BY c ORDER BY rowTime() RANGE BETWEEN INTERVAL '1'
> HOUR PRECEDING AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The PARTITION BY clause is optional (no partitioning results in single
> threaded execution).
> - The ORDER BY clause may only have rowTime() as parameter. rowTime() is a
> parameterless scalar function that just indicates processing time mode.
> - UNBOUNDED PRECEDING is not supported (see FLINK-5658)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that some
> of the restrictions are trivial to address, we can add the functionality in
> this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with
> RexOver expression).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)