Repository: flink Updated Branches: refs/heads/master eed41e1fc -> d98ba08a7
[FLINK-6163] Document per-window state in ProcessWindowFunction Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/d98ba08a Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/d98ba08a Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/d98ba08a Branch: refs/heads/master Commit: d98ba08a7a73dc93c839c77d11245aba70869ab5 Parents: eed41e1 Author: Aljoscha Krettek <[email protected]> Authored: Fri Nov 10 10:54:16 2017 +0100 Committer: Aljoscha Krettek <[email protected]> Committed: Fri Nov 10 15:55:31 2017 +0100 ---------------------------------------------------------------------- docs/dev/stream/operators/windows.md | 32 +++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/d98ba08a/docs/dev/stream/operators/windows.md ---------------------------------------------------------------------- diff --git a/docs/dev/stream/operators/windows.md b/docs/dev/stream/operators/windows.md index 7966ec8..3c0cd85 100644 --- a/docs/dev/stream/operators/windows.md +++ b/docs/dev/stream/operators/windows.md @@ -978,6 +978,38 @@ input </div> </div> +### Using per-window state in ProcessWindowFunction + +In addition to accessing keyed state (as any rich function can) a `ProcessWindowFunction` can +also use keyed state that is scoped to the window that the function is currently processing. In this +context it is important to understand what the window that *per-window* state is referring to is. +There are different "windows" involved: + + - The window that was defined when specifying the windowed operation: This might be *tumbling + windows of 1 hour* or *sliding windows of 2 hours that slide by 1 hour*. + - An actual instance of a defined window for a given key: This might be *time window from 12:00 + to 13:00 for user-id xyz*. This is based on the window definition and there will be many windows + based on the number of keys that the job is currently processing and based on what time slots + the events fall into. + +Per-window state is tied to the latter of those two. Meaning that if we process events for 1000 +different keys and events for all of them currently fall into the *[12:00, 13:00)* time window +then there will be 1000 window instances that each have their own keyed per-window state. + +There are two methods on the `Context` object that a `process()` invocation receives that allow +access two the two types of state: + + - `globalState()`, which allows access to keyed state that is not scoped to a window + - `windowState()`, which allows access to keyed state that is also scoped to the window + +This feature is helpful if you anticipate multiple firing for the same window, as can happen when +you have late firings for data that arrives late or when you have a custom trigger that does +speculative early firings. In such a case you would store information about previous firings or +the number of firings in per-window state. + +When using windowed state it is important to also clean up that state when a window is cleared. This +should happen in the `clear()` method. + ### WindowFunction (Legacy) In some places where a `ProcessWindowFunction` can be used you can also use a `WindowFunction`. This
