[
https://issues.apache.org/jira/browse/FLINK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838206#comment-15838206
]
ASF GitHub Bot commented on FLINK-5529:
---------------------------------------
Github user aljoscha commented on a diff in the pull request:
https://github.com/apache/flink/pull/3191#discussion_r97828488
--- Diff: docs/dev/windows.md ---
@@ -622,133 +690,138 @@ input
</div>
</div>
-## Dealing with Late Data
+## Triggers
-When working with event-time windowing it can happen that elements arrive
late, i.e the
-watermark that Flink uses to keep track of the progress of event-time is
already past the
-end timestamp of a window to which an element belongs. Please
-see [event time](./event_time.html) and especially
-[late elements](./event_time.html#late-elements) for a more thorough
discussion of
-how Flink deals with event time.
+A `Trigger` determines when a window (as formed by the `WindowAssigner`)
is ready to be
+processed by the *window function*. Each `WindowAssigner` comes with a
default `Trigger`.
+If the default trigger does not fit your needs, you can specify a custom
trigger using `trigger(...)`.
-You can specify how a windowed transformation should deal with late
elements and how much lateness
-is allowed. The parameter for this is called *allowed lateness*. This
specifies by how much time
-elements can be late. Elements that arrive within the allowed lateness are
still put into windows
-and are considered when computing window results. If elements arrive after
the allowed lateness they
-will be dropped. Flink will also make sure that any state held by the
windowing operation is garbage
-collected once the watermark passes the end of a window plus the allowed
lateness.
+The trigger interface provides five methods that react to different
events:
-<span class="label label-info">Default</span> By default, the allowed
lateness is set to
-`0`. That is, elements that arrive behind the watermark will be dropped.
+* The `onElement()` method is called for each element that is added to a
window.
+* The `onEventTime()` method is called when a registered event-time timer
fires.
+* The `onProcessingTime()` method is called when a registered
processing-time timer fires.
+* The `onMerge()` method is relevant for stateful triggers and merges the
states of two triggers when their corresponding windows merge, *e.g.* when
using session windows.
+* Finally the `clear()` method performs any action needed upon removal of
the corresponding window.
-You can specify an allowed lateness like this:
+Any of these methods can be used to register processing- or event-time
timers for future actions.
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-{% highlight java %}
-DataStream<T> input = ...;
+### Fire and Purge
-input
- .keyBy(<key selector>)
- .window(<window assigner>)
- .allowedLateness(<time>)
- .<windowed transformation>(<window function>);
-{% endhighlight %}
-</div>
+Once a trigger determines that a window is ready for processing, it fires.
This is the signal for the window operator to emit the result of the current
window. Given a window with a `WindowFunction`
--- End diff --
Maybe we should mention earlier which trigger methods can return a
`TriggerResult` and quickly mention what that means. "A trigger fires when it
returns FIRE or FIRE_AND_PURGE."
> Improve / extends windowing documentation
> -----------------------------------------
>
> Key: FLINK-5529
> URL: https://issues.apache.org/jira/browse/FLINK-5529
> Project: Flink
> Issue Type: Sub-task
> Components: Documentation
> Reporter: Stephan Ewen
> Assignee: Kostas Kloudas
> Fix For: 1.2.0, 1.3.0
>
>
> Suggested Outline:
> {code}
> Windows
> (0) Outline: The anatomy of a window operation
> stream
> [.keyBy(...)] <- keyed versus non-keyed windows
> .window(...) <- required: "assigner"
> [.trigger(...)] <- optional: "trigger" (else default trigger)
> [.evictor(...)] <- optional: "evictor" (else no evictor)
> [.allowedLateness()] <- optional, else zero
> .reduce/fold/apply() <- required: "function"
> (1) Types of windows
> - tumble
> - slide
> - session
> - global
> (2) Pre-defined windows
> timeWindow() (tumble, slide)
> countWindow() (tumble, slide)
> - mention that count windows are inherently
> resource leaky unless limited key space
> (3) Window Functions
> - apply: most basic, iterates over elements in window
>
> - aggregating: reduce and fold, can be used with "apply()" which will get
> one element
>
> - forward reference to state size section
> (4) Advanced Windows
> - assigner
> - simple
> - merging
> - trigger
> - registering timers (processing time, event time)
> - state in triggers
> - life cycle of a window
> - create
> - state
> - cleanup
> - when is window contents purged
> - when is state dropped
> - when is metadata (like merging set) dropped
> (5) Late data
> - picture
> - fire vs fire_and_purge: late accumulates vs late resurrects (cf
> discarding mode)
>
> (6) Evictors
> - TDB
>
> (7) State size: How large will the state be?
> Basic rule: Each element has one copy per window it is assigned to
> --> num windows * num elements in window
> --> example: tumbline is one copy, sliding(n,m) is n/m copies
> --> per key
> Pre-aggregation:
> - if reduce or fold is set -> one element per window (rather than num
> elements in window)
> - evictor voids pre-aggregation from the perspective of state
> Special rules:
> - fold cannot pre-aggregate on session windows (and other merging windows)
> (8) Non-keyed windows
> - all elements through the same windows
> - currently not parallel
> - possible parallel in the future when having pre-aggregation functions
> - inherently (by definition) produce a result stream with parallelism one
> - state similar to one key of keyed windows
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)