Hi everyone,

we are very happy to support TUMBLE/HOP/SESSION in our upcoming Flink 1.3 release. However, there are some problems regarding nested window queries that we would like to discuss with the Calcite community.

Take the following query:

SELECT
  rowtime, SUM(x)
FROM (
  SELECT
     TUMBLE_END(rowtime, INTERVAL '2' MINUTE) AS rowtime,
     MIN(x) AS x
  FROM MyTable
  GROUP BY TUMBLE(rowtime, INTERVAL '2' MINUTE)
)
GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR)


Initially, we thought that we can use the xxx_END() group auxiliary functions to define the rowtime for the upper query. However, according to http://calcite.apache.org/docs/stream.html, TUMBLE_END should return the timestamp of the exclusive window end, i.e., for a window of 1 hour that contains all elements from 12:00:00.000 until 12:59:59.999 (inclusive), TUMBLE_END would return 13:00:00.000. The problem is that Flink uses the inclusive window end as new timestamp. The reason for that is that if you do preaggregation with a window, say 5 minute windows which later will be aggregated into 1 hour windows, the last 5 minute window (from 12:55:00.000 until 12:59:59.999 incl) would have a timestamp of 13:00:00.000 and fall into the next window starting at 13:00:00.000.


The question is how Calcite is planning to support nested windows. Right now we see the following options:

- TUMBLE_END returns the inclusive window end

- we introduce an additional group auxiliary function for the inclusive window end like: SELECT TUMBLE_TIME(rowtime, INTERVAL '2' MINUTE) AS rowtime ...

- we allow references to the window in the select: SELECT TUMBLE(rowtime, INTERVAL '1' HOUR) AS rowtime ...

What do you think?


Regards,

Timo




Reply via email to