[
https://issues.apache.org/jira/browse/FLINK-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825546#comment-15825546
]
sunjincheng commented on FLINK-5386:
------------------------------------
Hi [~fhueske] [~shaoxuan] thanks for the reply .
[~fhueske] You are right, no matter it is a stream table or a batch table, we
need to ensure the correctness. As you said we must check the window's
properties at the implementation phase. I agree with you.
BTW, "Groupby ('w)" is not only consistent with the row-window, but also
consistent with the calcite SQL. For instance:
GroupBy:
{code}
SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime,
productId,
COUNT(*) AS c,
SUM(units) AS units
FROM Orders
GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId;
{code}
Over:
{code}
SELECT STREAM *
FROM (
SELECT STREAM rowtime,
productId,
units,
AVG(units) OVER product (RANGE INTERVAL '10' MINUTE PRECEDING) AS m10,
AVG(units) OVER product (RANGE INTERVAL '7' DAY PRECEDING) AS d7
FROM Orders
WINDOW product AS (
ORDER BY rowtime
PARTITION BY productId))
WHERE m10 > d7;
{code}
The following two statements are supported by the current changes:
#1. windows are defined at the start and used later:
{code}
val windowedTable = table
.window(Slide over 10.milli every 5.milli as 'w1)
.window(Tumble over 5.milli as 'w2)
.groupBy('w1, 'key)
.select('string, 'int.count as 'count, 'w1.start)
.groupBy( 'w2, 'key)
.select('string, 'count.sum as sum2)
{code}
#2. windows are defined with groupBy:
{code}
val windowedTable = table
.window(Slide over 10.milli every 5.milli as 'w1)
.groupBy('w1, 'key)
.select('string, 'int.count as 'count, 'w1.start)
.window(Tumble over 5.milli as 'w2)
.groupBy( 'w2, 'key)
.select('string, 'count.sum as sum2)
{code}
I hope this makes sense to you?
You said "by tying window and groupBy together, we could avoid such situations"
is just like # 2 or must be written "groupBy (). Window ()"?
reference:
Azure: https://msdn.microsoft.com/en-us/library/azure/dn835051.aspx
Calcite: http://calcite.apache.org/docs/stream.html#tumbling-windows
> Refactoring Window Clause
> -------------------------
>
> Key: FLINK-5386
> URL: https://issues.apache.org/jira/browse/FLINK-5386
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: sunjincheng
> Assignee: sunjincheng
>
> Similar to the SQL, window clause is defined "as" a symbol which is
> explicitly used in groupby/over. We are proposing to refactor the way to
> write groupby+window tableAPI as follows:
> {code}
> val windowedTable = table
> .window(Slide over 10.milli every 5.milli as 'w1)
> .window(Tumble over 5.milli as 'w2)
> .groupBy('w1, 'key)
> .select('string, 'int.count as 'count, 'w1.start)
> .groupBy( 'w2, 'key)
> .select('string, 'count.sum as sum2)
> .window(Tumble over 5.milli as 'w3)
> .groupBy( 'w3) // windowAll
> .select('sum2, 'w3.start, 'w3.end)
> {code}
> In this way, we can remove both GroupWindowedTable and the window() method in
> GroupedTable which makes the API a bit clean. In addition, for row-window, we
> anyway need to define window clause as a symbol. This change will make the
> API of window and row-window consistent, example for row-window:
> {code}
> .window(RowXXXWindow as ‘x, RowYYYWindow as ‘y)
> .select(‘a, ‘b.count over ‘x as ‘xcnt, ‘c.count over ‘y as ‘ycnt, ‘x.start,
> ‘x.end)
> {code}
> What do you think? [~fhueske] [~twalthr]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)