[GitHub] [beam] melap commented on a change in pull request #15780: [BEAM-11758] Update basics page: Trigger, State and timers

GitBox Fri, 19 Nov 2021 12:02:13 -0800


melap commented on a change in pull request #15780:
URL: https://github.com/apache/beam/pull/15780#discussion_r753485092




##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +369,108 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Trigger
+
+When collecting and grouping data into windows, Beam uses _triggers_ to
+determine when to emit the aggregated results of each window (referred to as a
+_pane_). If you use Beam’s default windowing configuration and default trigger,
+Beam outputs the aggregated result when it estimates all data has arrived, and
+discards all subsequent data for that window.
+
+At a high level, triggers provide two additional capabilities compared to
+outputting at the end of a window:
+
+ 1. Triggers allow Beam to emit early results, before all the data in a given
+    window has arrived. For example, emitting after a certain amount of time
+    elapses, or after a certain number of elements arrives.
+ 2. Triggers allow processing of late data by triggering after the event time
+    watermark passes the end of the window.
+
+These capabilities allow you to control the flow of your data and also balance
+between data completeness, latency, and cost.
+
+Beam provides a number of pre-built triggers that you can set:
+
+ * **Event time triggers**: These triggers operate on the event time, as
+   indicated by the timestamp on each data element. Beam’s default trigger is
+   event time-based.
+ * **Processing time triggers**: These triggers operate on the processing time,
+   which is the time when the data element is processed at any given stage in
+   the pipeline.
+ * **Data-driven triggers**: These triggers operate by examining the data as it
+   arrives in each window, and firing when that data meets a certain property.
+   Currently, data-driven triggers only support firing after a certain number 
of
+   data elements.
+ * **Composite triggers**: These triggers combine multiple triggers in various
+   ways.
+
+For more information about triggers, see the following page:
+
+ * [Beam Programming Guide: 
Triggers](/documentation/programming-guide/#triggers)
+
+### State and timers
+
+Beam’s windowing and triggers provide an abstraction for grouping and
+aggregating unbounded input data based on timestamps. However, there are
+aggregation use cases that might require an even higher degree of control. 
State
+and timers are two important concepts that help with these uses cases.
+
+**State**:
+
+Beam provides the State API for manually managing per-key state, allowing for
+fine-grained control over aggregations.  The State API lets you augment
+element-wise operations (for example, `ParDo` or `Map`) with mutable state.
+
+The State API models state per key. To use the state API, you start out with a
+keyed `PCollection`. A `ParDo` that processes this `PCollection` can declare
+persistent state variables. When you process each element inside the `ParDo`,
+you can use the state variables to write or update state for the current key or
+to read previous state written for that key. State is always fully scoped only
+to the current processing key.
+
+Beam provides several types of state:

Review comment:
       Done

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +369,108 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Trigger
+
+When collecting and grouping data into windows, Beam uses _triggers_ to
+determine when to emit the aggregated results of each window (referred to as a
+_pane_). If you use Beam’s default windowing configuration and default trigger,
+Beam outputs the aggregated result when it estimates all data has arrived, and
+discards all subsequent data for that window.
+
+At a high level, triggers provide two additional capabilities compared to
+outputting at the end of a window:
+
+ 1. Triggers allow Beam to emit early results, before all the data in a given
+    window has arrived. For example, emitting after a certain amount of time
+    elapses, or after a certain number of elements arrives.
+ 2. Triggers allow processing of late data by triggering after the event time
+    watermark passes the end of the window.
+
+These capabilities allow you to control the flow of your data and also balance
+between data completeness, latency, and cost.
+
+Beam provides a number of pre-built triggers that you can set:
+
+ * **Event time triggers**: These triggers operate on the event time, as
+   indicated by the timestamp on each data element. Beam’s default trigger is
+   event time-based.
+ * **Processing time triggers**: These triggers operate on the processing time,
+   which is the time when the data element is processed at any given stage in
+   the pipeline.
+ * **Data-driven triggers**: These triggers operate by examining the data as it
+   arrives in each window, and firing when that data meets a certain property.
+   Currently, data-driven triggers only support firing after a certain number 
of
+   data elements.
+ * **Composite triggers**: These triggers combine multiple triggers in various
+   ways.
+
+For more information about triggers, see the following page:
+
+ * [Beam Programming Guide: 
Triggers](/documentation/programming-guide/#triggers)
+
+### State and timers
+
+Beam’s windowing and triggers provide an abstraction for grouping and
+aggregating unbounded input data based on timestamps. However, there are
+aggregation use cases that might require an even higher degree of control. 
State
+and timers are two important concepts that help with these uses cases.
+
+**State**:
+
+Beam provides the State API for manually managing per-key state, allowing for
+fine-grained control over aggregations.  The State API lets you augment
+element-wise operations (for example, `ParDo` or `Map`) with mutable state.
+
+The State API models state per key. To use the state API, you start out with a
+keyed `PCollection`. A `ParDo` that processes this `PCollection` can declare
+persistent state variables. When you process each element inside the `ParDo`,
+you can use the state variables to write or update state for the current key or
+to read previous state written for that key. State is always fully scoped only
+to the current processing key.
+
+Beam provides several types of state:
+
+ * **ValueState**: A ValueState is a scalar state value. For each key in the
+   input, a ValueState stores a typed value that can be read and modified 
inside
+   the `DoFn`.
+ * **CombiningState**: CombiningState allows you to create a state object that 
is
+   updated using a Beam combiner.
+ * **BagState**: A common use case for state is to accumulate multiple 
elements.
+   BagState allows you to accumulate an unordered set of elements. This lets 
you
+   add elements to the collection without needing to read the entire collection
+   first.
+
+You can use the State API together with the Timer API to create processing 
tasks
+that give you fine-grained control over the workflow.
+
+**Timers**:

Review comment:
       Done

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -42,6 +42,10 @@ understand an important set of core concepts:
    them to a runner.
  * [_Runner_](#runner) - A runner runs a Beam pipeline using the capabilities 
of
    your chosen data processing engine.
+ * [_Trigger_](#trigger) - A trigger determines when to aggregate the results 
of
+   each window.
+ * [_State and timers_](#state-and-timers) - Per-key state and timer callbacks

Review comment:
       Done

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +369,108 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Trigger
+
+When collecting and grouping data into windows, Beam uses _triggers_ to
+determine when to emit the aggregated results of each window (referred to as a
+_pane_). If you use Beam’s default windowing configuration and default trigger,
+Beam outputs the aggregated result when it estimates all data has arrived, and
+discards all subsequent data for that window.
+
+At a high level, triggers provide two additional capabilities compared to
+outputting at the end of a window:
+
+ 1. Triggers allow Beam to emit early results, before all the data in a given
+    window has arrived. For example, emitting after a certain amount of time
+    elapses, or after a certain number of elements arrives.
+ 2. Triggers allow processing of late data by triggering after the event time
+    watermark passes the end of the window.
+
+These capabilities allow you to control the flow of your data and also balance
+between data completeness, latency, and cost.
+
+Beam provides a number of pre-built triggers that you can set:
+
+ * **Event time triggers**: These triggers operate on the event time, as
+   indicated by the timestamp on each data element. Beam’s default trigger is
+   event time-based.
+ * **Processing time triggers**: These triggers operate on the processing time,
+   which is the time when the data element is processed at any given stage in
+   the pipeline.
+ * **Data-driven triggers**: These triggers operate by examining the data as it
+   arrives in each window, and firing when that data meets a certain property.
+   Currently, data-driven triggers only support firing after a certain number 
of
+   data elements.
+ * **Composite triggers**: These triggers combine multiple triggers in various
+   ways.
+
+For more information about triggers, see the following page:
+
+ * [Beam Programming Guide: 
Triggers](/documentation/programming-guide/#triggers)
+
+### State and timers
+
+Beam’s windowing and triggers provide an abstraction for grouping and
+aggregating unbounded input data based on timestamps. However, there are
+aggregation use cases that might require an even higher degree of control. 
State
+and timers are two important concepts that help with these uses cases.
+
+**State**:
+
+Beam provides the State API for manually managing per-key state, allowing for

Review comment:
       Done, added to both sections also




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] melap commented on a change in pull request #15780: [BEAM-11758] Update basics page: Trigger, State and timers

Reply via email to