Hi Feng, Thanks a lot for raising this point.
Yes, your understanding is correct. WindowStagger shifts not only the firing time, but also the window boundaries themselves. For example, with a 10-minute tumbling window and a 2-minute stagger offset, the windows become: - ..., [23:52-00:02), [00:02-00:12), ... instead of globally aligned windows: - ..., [00:00-00:10), [00:10-00:20), ... With KEY_BASED stagger, different keys may therefore have different window_start / window_end values. However, we believe this still preserves the semantics of tumbling windows. The core semantics of tumbling windows are: - fixed-size windows - non-overlapping windows - each record belongs to exactly one window KEY_BASED stagger does not change these properties. It only changes the alignment strategy from globally aligned boundaries to deterministic per-key boundaries. This is conceptually similar to the existing global offset support in TUMBLE TVF: - global offset: shifts all tumbling window boundaries together - stagger offset: further distributes boundaries across tasks/keys Both preserve tumbling window semantics. In addition, WindowStagger itself was already introduced by FLINK-12855 [1]. This FLIP mainly: 1. adds SQL support for WindowStagger 2. introduces a deterministic stagger type (KEY_BASED) The motivation for KEY_BASED is that existing RANDOM/NATURAL stagger strategies are non-deterministic across recovery, which may assign the same record to different windows before and after restart. KEY_BASED keeps the load distribution benefit while making window assignment deterministic. [1] https://issues.apache.org/jira/browse/FLINK-12855 Best, Zihao Feng Jin <[email protected]> 于2026年5月8日周五 10:17写道: > Hi Zihao, > > Thanks for the proposal. > > I would like to clarify one semantic point first. My understanding is that > WindowStagger shifts the window boundaries, rather than only delaying when > the > window is fired. > > For example, with a 10-minute TUMBLE window and a 2-minute stagger, the > windows > would become [..., 23:52-00:02), [00:02-00:12), ... instead of the globally > aligned [00:00-00:10), [00:10-00:20), ... > > If this is correct, KEY_BASED stagger would mean that different keys may > have > different window_start/window_end values. This seems to affect the > observable SQL > TUMBLE semantics, instead of being only a trigger-time load-smoothing > optimization. > > Could you clarify whether this is the intended behavior? > > Best, > Feng > > > > On Tue, Apr 7, 2026 at 3:14 PM zihao chen <[email protected]> wrote: > > > Hi all, > > > > I would like to start a discussion on *FLIP-XXX: Support Window Stagger > in > > FlinkSQL and introduce KEY_BASED deterministic stagger*. > > > > Currently, Flink provides WindowStagger to distribute window trigger time > > and reduce burst load (see FLINK-12855: Stagger > > TumblingProcessingTimeWindow processing to distribute workload > > <https://issues.apache.org/jira/browse/FLINK-12855>). However, there are > > two limitations in production usage: > > > > 1. WindowStagger is not supported in Flink SQL TVF tumbling windows > > 2. Existing stagger strategies (e.g., RANDOM, NATURAL) are > > non-deterministic and may assign records to different windows after > > recovery, leading to inconsistent results > > > > This FLIP proposes: > > > > 1. Add WindowStagger *support in Flink SQL TUMBLE TVF* > > 2. Introduce a new deterministic stagger strategy: *KEY_BASED* > > - The stagger offset is computed from the key > > - Ensures consistent window assignment before and after recovery > > > > Example syntax: > > > > TUMBLE(TABLE t, DESCRIPTOR(rowtime), INTERVAL '1' HOUR, 'RANDOM') > > > > TUMBLE(TABLE t, DESCRIPTOR(rowtime), INTERVAL '1' HOUR, INTERVAL '5' > > MINUTE, 'RANDOM') > > > > The detailed designs are described in the FLIP document: > > > > > > > https://docs.google.com/document/d/12NX_s-2HI1C9qjw4two0OHeYfX8YBMiBsq1X6lCfiW0/edit?tab=t.0#heading=h.alqmv4gpvbz7 > > > > Looking forward to your feedback. > > > > > > Best regards, > > > > Zihao Chen > > >
