[
https://issues.apache.org/jira/browse/BEAM-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot updated BEAM-4702:
--------------------------------
Labels: stale-P2 (was: )
> After SQL GROUP BY <windowing> the result should be globally windowed
> ---------------------------------------------------------------------
>
> Key: BEAM-4702
> URL: https://issues.apache.org/jira/browse/BEAM-4702
> Project: Beam
> Issue Type: New Feature
> Components: dsl-sql
> Reporter: Kenneth Knowles
> Priority: P2
> Labels: stale-P2
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> Beam SQL runs in two contexts:
> 1. As a PTransform in a pipeline. A PTransform operates on a PCollection,
> which is always implicitly windows and a PTransform should operate per-window
> so it automatically works on bounded and unbounded data. This only works if
> the query has no windowing operators, in which case the GROUP BY <non-window
> stuff> should operate per-window.
> 2. As a top-level shell that starts and ends with SQL. In the relational
> model there are no implicit windows. Calcite has some extensions for
> windowing, but they manifest (IMO correctly) as just items in the GROUP BY
> list. The output of the aggregation is "just rows" again. So it should be
> globally windowed.
> The problem is that this semantic fix makes it so we cannot join windowing
> stream subqueries. Because we don't have retractions, we only support
> GroupByKey-based equijoins over windowed streams, with the default trigger.
> _These joins implicitly also join windows_. For example:
> {code}
> JOIN(left.id = right.id)
> SELECT ... GROUP BY id, TUMBLE(1 hour)
> SELECT ... GROUP BY id, TUMBLE(1 hour)
> {code}
> Semantically, there may be a joined row for 1:00pm on the left and 10:00pm on
> the right. But by the time the right-hand row for 10:00pm shows up, the left
> one may be GC'd. So this is implicitly, but nondeterministically, joining on
> the window as well. Before this PR, we left the windowing strategies for left
> and right in place, and asserted that they matched.
> If we re-window into the global window always, there _are no windowed
> streams_ so you just can't do stream joins. The solution is probably to track
> which field of a stream is the window and allow joins which also explicitly
> express the equijoin over the window field.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)