Thanks Julian. I agree, semantically TUMBLE_START would work. Unfortunately, the window implementations of Flink's DataStream API (which are used to implement TUMBLE, HOP, and SESSION windows) assign (window_end - 1) as internal timestamp to records. The reason for this design decision is that otherwise we would need to hold back watermarks until a window ends to prevent that a window emits late data. Holding back watermarks is tricky (especially in case of non-aligned windows such as SESSION) because the emission of watermarks would need to be coordinated across the windows for all grouping keys that are processed by a parallel task instance.
Of course, these are internal details of Flink which are not directly related to the semantics of the window start/end functions, but they make the use of TUMBLE_START impractical for us (at least for now). As a workaround for the upcoming 1.3.0 release we added another method TUMBLE_ROWTIME that returns TUMBLE_END - 1 which can be used for time-based operations. Best, Fabian 2017-05-18 17:59 GMT+02:00 Julian Hyde <[email protected]>: > By the way, a principle I would follow when composing windows is that all > that comes out of a query is simply data values. I don’t think this is > controversial. But I don’t want anyone to suggest that we pass fuzzy > concepts like “window instances” from sub-query to enclosing query. > > Julian > > > > On May 18, 2017, at 11:53 AM, Julian Hyde <[email protected]> wrote: > > > > Have you considered using TUMBLE_START? This is an inclusive bound, so > you should be able to compose windows. > > > > It’s possible that TUMBLE_START isn’t returning the right value yet, > e.g. see https://insight.io/github.com/apache/calcite/blob/ > a11d14054e9c1d2ce22f60e11536f1885faaae7c/core/src/main/java/ > org/apache/calcite/sql2rel/AuxiliaryConverter.java#L56 < > https://insight.io/github.com/apache/calcite/blob/ > a11d14054e9c1d2ce22f60e11536f1885faaae7c/core/src/main/java/ > org/apache/calcite/sql2rel/AuxiliaryConverter.java#L56> but in principle > it’s the right thing to use. > > > > Julian > > > > > >> On May 17, 2017, at 4:46 AM, Timo Walther <[email protected] <mailto: > [email protected]>> wrote: > >> > >> Hi everyone, > >> > >> we are very happy to support TUMBLE/HOP/SESSION in our upcoming Flink > 1.3 release. However, there are some problems regarding nested window > queries that we would like to discuss with the Calcite community. > >> > >> Take the following query: > >> > >> SELECT > >> rowtime, SUM(x) > >> FROM ( > >> SELECT > >> TUMBLE_END(rowtime, INTERVAL '2' MINUTE) AS rowtime, > >> MIN(x) AS x > >> FROM MyTable > >> GROUP BY TUMBLE(rowtime, INTERVAL '2' MINUTE) > >> ) > >> GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR) > >> > >> > >> Initially, we thought that we can use the xxx_END() group auxiliary > functions to define the rowtime for the upper query. However, according to > http://calcite.apache.org/docs/stream.html <http://calcite.apache.org/ > docs/stream.html>, TUMBLE_END should return the timestamp of the > exclusive window end, i.e., for a window of 1 hour that contains all > elements from 12:00:00.000 until 12:59:59.999 (inclusive), TUMBLE_END would > return 13:00:00.000. The problem is that Flink uses the inclusive window > end as new timestamp. The reason for that is that if you do preaggregation > with a window, say 5 minute windows which later will be aggregated into 1 > hour windows, the last 5 minute window (from 12:55:00.000 until > 12:59:59.999 incl) would have a timestamp of 13:00:00.000 and fall into the > next window starting at 13:00:00.000. > >> > >> > >> The question is how Calcite is planning to support nested windows. > Right now we see the following options: > >> > >> - TUMBLE_END returns the inclusive window end > >> > >> - we introduce an additional group auxiliary function for the inclusive > window end like: SELECT TUMBLE_TIME(rowtime, INTERVAL '2' MINUTE) AS > rowtime ... > >> > >> - we allow references to the window in the select: SELECT > TUMBLE(rowtime, INTERVAL '1' HOUR) AS rowtime ... > >> > >> What do you think? > >> > >> > >> Regards, > >> > >> Timo > >> > >> > >> > >> > > > >
