[
https://issues.apache.org/jira/browse/CALCITE-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310739#comment-15310739
]
Julian Hyde commented on CALCITE-1237:
--------------------------------------
You're right, {{session}} is more like a ranking function than an aggregate
function. You're also right that to be strict, we'd need to have an {{OVER}}
clause. Putting that together, our example becomes
{code}
select stream session_start, session_end, productId, count(*) as c
from Orders
group by productId, session(interval '5' second) over (order by rowtime)
{code}
Because {{productId}} occurs earlier than {{session}} in the {{GROUP BY}}
clause, we only consider sequences of rows to be in a session if they have the
same {{productId}} value. So, {{session}} is "seeing" earlier items in the
{{GROUP BY}} clause.
I agree that we should disallow {{NTILE}} if the number of elements in a
partition is infinite. But there are ways to make that partition finite. First,
in the {{GROUP BY}} case:
{code}
SELECT ...
FROM Orders
GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), NTILE(units) OVER(rowtime)
{code}
Next, in the windowed aggregation case:
{code}
SELECT STREAM productId,
NTILE(units) OVER (ORDER BY rowtime
RANGE INTERVAL '1' HOUR PRECEDING
PARTITION BY productId)
FROM Orders
{code}
> Session windows for streaming SQL
> ---------------------------------
>
> Key: CALCITE-1237
> URL: https://issues.apache.org/jira/browse/CALCITE-1237
> Project: Calcite
> Issue Type: Bug
> Components: stream
> Reporter: Julian Hyde
> Assignee: Julian Hyde
>
> A session window is a collection of rows whose key values, when sorted, have
> a gap of at most N.
> Q1. Should "at most" be "less than"?
> The key type can be any type that has a minus operator, that is, numeric and
> date-time.
> I propose the following syntax: {{session(key [, ...]*, interval)}}. For
> example:
> {code}
> select stream session(rowtime, productId, interval '5' second),
> productId, count(*) as c
> from Orders
> group by session(rowtime, productId, interval '5' second),
> productId
> {code}
> to find bursts of orders for the same product where consecutive orders are no
> more than 5 seconds apart.
> The first key column {{rowtime}} defines the session and must be of
> numeric/date-time type, and must have monotonicity or similar in order for
> the query to make progress; the other key columns (in this case
> {{productId}}) can be of any type; the last column is the interval, and must
> be constant.
> The {{session}} function returns the key value at the start of the window.
> Unlike the {{hop}} function, each row belongs to precisely one window. But
> {{session}} is not a true function, because its value depends on the records
> flowing in the stream.
> Q2. If {{session}} is used, should we allow order-dependent aggregate
> functions such as {{first_value}}?
> Q3. Should we allow {{session}} as a windowed aggregate function?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)