[GitHub] [beam] alxp1982 commented on a diff in pull request #25258: [Tour of Beam] Learning content for "Windowing" module

via GitHub Tue, 14 Feb 2023 01:46:19 -0800


alxp1982 commented on code in PR #25258:
URL: https://github.com/apache/beam/pull/25258#discussion_r1105462640



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.

Review Comment:
   `Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.

Review Comment:
   `Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into **fixed-length, 
non-overlapping** time intervals, which can be useful for a variety of use 
cases.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.

Review Comment:
   Sliding time windows also allow looking at data more dynamically. This is 
useful when you have a high-frequency data stream, and you want to look at the 
most recent data.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.
+
+In summary, Sliding time windows are useful for performing running 
aggregations, anomaly detection and looking at data in a more dynamic way.
+
+
+`Session windows` are a type of windowing that groups data elements based on 
periods of inactivity or "gaps" in the data stream. They are useful when you 
want to group data elements that are related to a specific event or activity 
together.
+
+One of the main use cases for session windows is to group together data 
elements that are related to a user's session on a website or application. By 
using session windows with a relatively short gap duration, you can ensure that 
all the events related to a user's session are grouped together. This allows 
you to compute session-level metrics, such as the number of pages viewed per 
session, the duration of a session, or the number of events per session.
+
+Another use case for session windows is to group together data elements that 
are related to a specific device's usage. For example, if you are collecting 
sensor data, you can use session windows to group together data elements that 
are collected while the device is in use. This allows you to compute 
device-level metrics, such as the number of sensor readings per device, the 
duration of device usage, or the number of events per device.

Review Comment:
   Another use case for session windows is to **group data elements related to 
a specific device's usage**. For example, if you are collecting sensor data, 
you can use session windows to group data elements collected while the device 
is in use. This allows you to compute device-level metrics, such as the number 
of sensor readings per device, the duration of device usage, or the number of 
events per device.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.
+
+In summary, Sliding time windows are useful for performing running 
aggregations, anomaly detection and looking at data in a more dynamic way.
+
+
+`Session windows` are a type of windowing that groups data elements based on 
periods of inactivity or "gaps" in the data stream. They are useful when you 
want to group data elements that are related to a specific event or activity 
together.
+
+One of the main use cases for session windows is to group together data 
elements that are related to a user's session on a website or application. By 
using session windows with a relatively short gap duration, you can ensure that 
all the events related to a user's session are grouped together. This allows 
you to compute session-level metrics, such as the number of pages viewed per 
session, the duration of a session, or the number of events per session.
+
+Another use case for session windows is to group together data elements that 
are related to a specific device's usage. For example, if you are collecting 
sensor data, you can use session windows to group together data elements that 
are collected while the device is in use. This allows you to compute 
device-level metrics, such as the number of sensor readings per device, the 
duration of device usage, or the number of events per device.
+
+In summary, session windows are useful for grouping data elements that are 
related to specific events or activities, such as user sessions or device 
usage. This allows you to compute event- or device-level metrics.
+
+
+A `single global window` is a type of windowing that treats all data elements 
as belonging to the same window. This means that all elements in the data 
stream are processed together and no windowing is applied.
+
+The main use case for a single global window is when you want to process all 
the data elements in your data stream as a whole, without breaking them up into 
smaller windows. This can be useful in situations where you don't need to 
compute window-level metrics, such as running averages or counts, but instead 
want to process the entire data stream as a single unit.

Review Comment:
   The primary use case for a single global window is when you want to process 
all the data elements in your data stream **as a whole** without breaking them 
up into smaller windows. For example, this can be useful when you don't need to 
compute window-level metrics, such as running averages or counts, but instead, 
you want to process the entire data stream as a single unit.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.
+
+In summary, Sliding time windows are useful for performing running 
aggregations, anomaly detection and looking at data in a more dynamic way.
+
+
+`Session windows` are a type of windowing that groups data elements based on 
periods of inactivity or "gaps" in the data stream. They are useful when you 
want to group data elements that are related to a specific event or activity 
together.
+
+One of the main use cases for session windows is to group together data 
elements that are related to a user's session on a website or application. By 
using session windows with a relatively short gap duration, you can ensure that 
all the events related to a user's session are grouped together. This allows 
you to compute session-level metrics, such as the number of pages viewed per 
session, the duration of a session, or the number of events per session.
+
+Another use case for session windows is to group together data elements that 
are related to a specific device's usage. For example, if you are collecting 
sensor data, you can use session windows to group together data elements that 
are collected while the device is in use. This allows you to compute 
device-level metrics, such as the number of sensor readings per device, the 
duration of device usage, or the number of events per device.
+
+In summary, session windows are useful for grouping data elements that are 
related to specific events or activities, such as user sessions or device 
usage. This allows you to compute event- or device-level metrics.
+
+
+A `single global window` is a type of windowing that treats all data elements 
as belonging to the same window. This means that all elements in the data 
stream are processed together and no windowing is applied.
+
+The main use case for a single global window is when you want to process all 
the data elements in your data stream as a whole, without breaking them up into 
smaller windows. This can be useful in situations where you don't need to 
compute window-level metrics, such as running averages or counts, but instead 
want to process the entire data stream as a single unit.
+
+For example, if you are using a data pipeline to filter out invalid data 
elements and then store the remaining data in a database, you might use a 
single global window to process all the data elements together, without 
breaking them up into smaller windows.
+
+Another use case is when your data streams are already time-stamped and you 
want to process events in the order they arrive, so you don't want to group 
them based on time windows.
+
+In summary, a single global window is useful when you want to process all the 
data elements in your data stream as a whole, without breaking them up into 
smaller windows. It can be useful for situations where you don't need to 
compute window-level metrics, or for processing events in the order they arrive.

Review Comment:
   In summary, a single global window is useful when you want to process all 
the data elements in your data stream as a whole without breaking them up into 
smaller windows. In addition, it can be helpful for situations **where you 
don't need to compute window-level metrics or for processing events in the 
order they arrive**.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.
+
+In summary, Sliding time windows are useful for performing running 
aggregations, anomaly detection and looking at data in a more dynamic way.
+
+
+`Session windows` are a type of windowing that groups data elements based on 
periods of inactivity or "gaps" in the data stream. They are useful when you 
want to group data elements that are related to a specific event or activity 
together.
+
+One of the main use cases for session windows is to group together data 
elements that are related to a user's session on a website or application. By 
using session windows with a relatively short gap duration, you can ensure that 
all the events related to a user's session are grouped together. This allows 
you to compute session-level metrics, such as the number of pages viewed per 
session, the duration of a session, or the number of events per session.
+
+Another use case for session windows is to group together data elements that 
are related to a specific device's usage. For example, if you are collecting 
sensor data, you can use session windows to group together data elements that 
are collected while the device is in use. This allows you to compute 
device-level metrics, such as the number of sensor readings per device, the 
duration of device usage, or the number of events per device.
+
+In summary, session windows are useful for grouping data elements that are 
related to specific events or activities, such as user sessions or device 
usage. This allows you to compute event- or device-level metrics.

Review Comment:
   In summary, session windows help **group data elements related to specific 
events or activities, such as user sessions or device usage**. This allows you 
to compute event- or device-level metrics.



##########
learning/tour-of-beam/learning-content/windowing/fixed-time-window/description.md:
##########
@@ -0,0 +1,68 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Fixed time windows
+
+The simplest form of windowing is using fixed time windows: given a 
timestamped `PCollection` which might be continuously updating, each window 
might capture (for example) all elements with timestamps that fall into a 
30-second interval.
+
+A fixed time window represents a consistent duration, non overlapping time 
interval in the data stream. Consider windows with a 30-second duration: all 
the elements in your unbounded PCollection with timestamp values from 0:00:00 
up to (but not including) 0:00:30 belong to the first window, elements with 
timestamp values from 0:00:30 up to (but not including) 0:01:00 belong to the 
second window, and so on.
+
+{{if (eq .Sdk "go")}}
+```
+fixedWindowedItems := beam.WindowInto(s,
+       window.NewFixedWindows(30*time.Second),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> items = ...;
+    PCollection<String> fixedWindowedItems = items.apply(
+        Window.<String>into(FixedWindows.of(Duration.standardSeconds(30))));
+```
+{{end}}
+
+{{if (eq .Sdk "python")}}
+```

Review Comment:
   Same as above



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.

Review Comment:
   Additionally, a fixed time window can also be helpful when dealing with data 
that arrive out-of-order or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
                
   
   To summarize, fixed time windows help perform **time-based aggregations** or 
handle **out-of-order or late data**.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.

Review Comment:
   One of the primary use cases for sliding time windows is to **compute 
running aggregates**. For example, if you want to calculate a running average 
of the past 60 seconds’ worth of data updated every 30 seconds, you can use 
sliding time windows. You can do this by defining a window duration of 60 
seconds and a sliding interval of 30 seconds. With this configuration, you will 
have windows that slide every 30 seconds, each covering a 60-second interval.



##########
learning/tour-of-beam/learning-content/windowing/global-window/description.md:
##########
@@ -0,0 +1,59 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### The single global window
+
+By default, all data in a `PCollection` is assigned to the single global 
window, and late data is discarded. If your data set is of a fixed size, you 
can use the global window default for your `PCollection`.
+
+You can use the single global window if you are working with an unbounded data 
set (e.g. from a streaming data source) but use caution when applying 
aggregating transforms such as `GroupByKey` and `Combine`. The single global 
window with a default trigger generally requires the entire data set to be 
available before processing, which is not possible with continuously updating 
data. To perform aggregations on an unbounded `PCollection` that uses global 
windowing, you should specify a non-default trigger for that `PCollection`.
+
+If your `PCollection` is bounded (the size is fixed), you can assign all the 
elements to a single global window. The following example code shows how to set 
a single global window for a `PCollection`:
+
+{{if (eq .Sdk "go")}}
+```

Review Comment:
   Please add a short description of how a single global window could be 
created in this particular SDK. 



##########
learning/tour-of-beam/learning-content/windowing/global-window/description.md:
##########
@@ -0,0 +1,59 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### The single global window
+
+By default, all data in a `PCollection` is assigned to the single global 
window, and late data is discarded. If your data set is of a fixed size, you 
can use the global window default for your `PCollection`.
+
+You can use the single global window if you are working with an unbounded data 
set (e.g. from a streaming data source) but use caution when applying 
aggregating transforms such as `GroupByKey` and `Combine`. The single global 
window with a default trigger generally requires the entire data set to be 
available before processing, which is not possible with continuously updating 
data. To perform aggregations on an unbounded `PCollection` that uses global 
windowing, you should specify a non-default trigger for that `PCollection`.
+
+If your `PCollection` is bounded (the size is fixed), you can assign all the 
elements to a single global window. The following example code shows how to set 
a single global window for a `PCollection`:
+
+{{if (eq .Sdk "go")}}
+```
+globalWindowedItems := beam.WindowInto(s,
+       window.NewGlobalWindows(),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> items = ...;
+PCollection<String> batchItems = items.apply(
+  Window.<String>into(new GlobalWindows()));
+```
+{{end}}
+
+{{if (eq .Sdk "python")}}
+```
+from apache_beam import window
+global_windowed_items = (
+    items | 'window' >> beam.WindowInto(window.GlobalWindows()))
+```
+{{end}}
+
+### Playground exercise
+

Review Comment:
   Please add runnable example description



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.

Review Comment:
   Another use case for sliding time windows is to perform **anomaly 
detection**. By computing the running aggregates over a sliding window, you can 
detect patterns that deviate significantly from the historical data.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.
+
+In summary, Sliding time windows are useful for performing running 
aggregations, anomaly detection and looking at data in a more dynamic way.
+
+
+`Session windows` are a type of windowing that groups data elements based on 
periods of inactivity or "gaps" in the data stream. They are useful when you 
want to group data elements that are related to a specific event or activity 
together.
+
+One of the main use cases for session windows is to group together data 
elements that are related to a user's session on a website or application. By 
using session windows with a relatively short gap duration, you can ensure that 
all the events related to a user's session are grouped together. This allows 
you to compute session-level metrics, such as the number of pages viewed per 
session, the duration of a session, or the number of events per session.

Review Comment:
   One of the primary use cases for session windows is to **group together data 
elements related to a user's session on a website or application**. For 
example, you can use session windows with a relatively short gap duration to 
ensure that all the events related to a user's session are grouped together. 
This allows you to compute session-level metrics, such as the number of pages 
viewed per session, the duration of a session, or the number of events per 
session.



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.
+
+In summary, Sliding time windows are useful for performing running 
aggregations, anomaly detection and looking at data in a more dynamic way.

Review Comment:
   In summary, Sliding time windows help perform **running aggregations, 
anomaly detection** and **looking at data more dynamically**.



##########
learning/tour-of-beam/learning-content/windowing/fixed-time-window/description.md:
##########
@@ -0,0 +1,68 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Fixed time windows
+
+The simplest form of windowing is using fixed time windows: given a 
timestamped `PCollection` which might be continuously updating, each window 
might capture (for example) all elements with timestamps that fall into a 
30-second interval.
+
+A fixed time window represents a consistent duration, non overlapping time 
interval in the data stream. Consider windows with a 30-second duration: all 
the elements in your unbounded PCollection with timestamp values from 0:00:00 
up to (but not including) 0:00:30 belong to the first window, elements with 
timestamp values from 0:00:30 up to (but not including) 0:01:00 belong to the 
second window, and so on.
+
+{{if (eq .Sdk "go")}}
+```
+fixedWindowedItems := beam.WindowInto(s,
+       window.NewFixedWindows(30*time.Second),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```

Review Comment:
   Same as above



##########
learning/tour-of-beam/learning-content/windowing/global-window/description.md:
##########
@@ -0,0 +1,59 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### The single global window
+
+By default, all data in a `PCollection` is assigned to the single global 
window, and late data is discarded. If your data set is of a fixed size, you 
can use the global window default for your `PCollection`.
+
+You can use the single global window if you are working with an unbounded data 
set (e.g. from a streaming data source) but use caution when applying 
aggregating transforms such as `GroupByKey` and `Combine`. The single global 
window with a default trigger generally requires the entire data set to be 
available before processing, which is not possible with continuously updating 
data. To perform aggregations on an unbounded `PCollection` that uses global 
windowing, you should specify a non-default trigger for that `PCollection`.
+
+If your `PCollection` is bounded (the size is fixed), you can assign all the 
elements to a single global window. The following example code shows how to set 
a single global window for a `PCollection`:
+
+{{if (eq .Sdk "go")}}
+```
+globalWindowedItems := beam.WindowInto(s,
+       window.NewGlobalWindows(),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> items = ...;
+PCollection<String> batchItems = items.apply(
+  Window.<String>into(new GlobalWindows()));
+```
+{{end}}
+
+{{if (eq .Sdk "python")}}
+```
+from apache_beam import window
+global_windowed_items = (
+    items | 'window' >> beam.WindowInto(window.GlobalWindows()))
+```
+{{end}}
+
+### Playground exercise
+
+`CombineFn` : This function allows you to perform operations such as counting, 
summing, or finding the minimum or maximum element within a global window.
+
+`GroupByKey` : This function groups elements by a key, and allows you to apply 
a beam.CombineFn to each group of elements within a global window.
+
+`Map` : This function allows you to apply a user-defined function to each 
element within a global window.
+
+`Filter` : This function allows you to filter elements based on a user-defined 
condition, within a global window.
+
+`FlatMap` : This function allows you to apply a user-defined function to each 
element within a global window and output zero or more elements.
+
+These functions can be easily composed together to create complex data 
processing pipelines. Additionally, it's also possible to create your own 
custom functions to perform specific operations within a global window.

Review Comment:
   How do we challenge users here? Great to provide descriptions for different 
methods that can be used, but need to challenge. 



##########
learning/tour-of-beam/learning-content/windowing/windowing-concept/description.md:
##########
@@ -0,0 +1,57 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+### Windowing
+
+Windowing subdivides a `PCollection` according to the timestamps of its 
individual elements. Transforms that aggregate multiple elements, such as 
GroupByKey and Combine, work implicitly on a per-window basis — they process 
each PCollection as a succession of multiple, finite windows, though the entire 
collection itself may be of unbounded size.
+
+Some Beam transforms, such as `GroupByKey` and `Combine`, group multiple 
elements by a common key. Ordinarily, that grouping operation groups all the 
elements that have the same key within the entire data set. With an unbounded 
data set, it is impossible to collect all the elements, since new elements are 
constantly being added and may be infinitely many (e.g. streaming data). If you 
are working with unbounded PCollections, windowing is especially useful.
+
+
+`Fixed time windows` are useful for performing time-based aggregations, such 
as counting the number of elements that arrived during each hour of the day. It 
allows you to group elements of a data set into fixed-length, non-overlapping 
time intervals, which can be useful for a variety of use cases.
+For example, imagine you have a stream of data that is recording the number of 
website visitors every second, and you want to know the total number of 
visitors for each hour of the day. Using fixed-time windows, you can group the 
data into hour-long windows and then perform a sum aggregation on each window 
to get the total number of visitors for each hour.
+
+Additionally, fixed time window can also be useful when dealing with data that 
arrives out-of-order, or when dealing with late data. By specifying a fixed 
window duration, you can ensure that all elements that belong to a particular 
window are processed together, regardless of when they arrived.
+
+In summary, fixed time windows are useful for performing time-based 
aggregations and for handling out-of-order or late data.
+
+
+`Sliding time windows` are similar to fixed time windows, but they have the 
added ability to move or slide over the data stream, allowing them to overlap 
with each other.
+
+One of the main use cases for sliding time windows is to compute running 
aggregates. For example, if you want to compute a running average of the past 
60 seconds’ worth of data updated every 30 seconds, you can use sliding time 
windows. This is done by defining a window duration of 60 seconds and a sliding 
interval of 30 seconds. With this configuration, you will have windows that 
slide every 30 seconds, each one covering a 60-second interval.
+
+Another use case for sliding time windows is to perform anomaly detection. By 
computing the running aggregates over a sliding window, you can detect patterns 
that deviate significantly from the historical data.
+
+Sliding time windows also allows to look at data in a more dynamic way. This 
is useful when you have a high-frequency data stream and you want to look at 
the most recent data.
+
+In summary, Sliding time windows are useful for performing running 
aggregations, anomaly detection and looking at data in a more dynamic way.
+
+
+`Session windows` are a type of windowing that groups data elements based on 
periods of inactivity or "gaps" in the data stream. They are useful when you 
want to group data elements that are related to a specific event or activity 
together.
+
+One of the main use cases for session windows is to group together data 
elements that are related to a user's session on a website or application. By 
using session windows with a relatively short gap duration, you can ensure that 
all the events related to a user's session are grouped together. This allows 
you to compute session-level metrics, such as the number of pages viewed per 
session, the duration of a session, or the number of events per session.
+
+Another use case for session windows is to group together data elements that 
are related to a specific device's usage. For example, if you are collecting 
sensor data, you can use session windows to group together data elements that 
are collected while the device is in use. This allows you to compute 
device-level metrics, such as the number of sensor readings per device, the 
duration of device usage, or the number of events per device.
+
+In summary, session windows are useful for grouping data elements that are 
related to specific events or activities, such as user sessions or device 
usage. This allows you to compute event- or device-level metrics.
+
+
+A `single global window` is a type of windowing that treats all data elements 
as belonging to the same window. This means that all elements in the data 
stream are processed together and no windowing is applied.
+
+The main use case for a single global window is when you want to process all 
the data elements in your data stream as a whole, without breaking them up into 
smaller windows. This can be useful in situations where you don't need to 
compute window-level metrics, such as running averages or counts, but instead 
want to process the entire data stream as a single unit.
+
+For example, if you are using a data pipeline to filter out invalid data 
elements and then store the remaining data in a database, you might use a 
single global window to process all the data elements together, without 
breaking them up into smaller windows.

Review Comment:
   For example, if you use a data pipeline to filter out invalid data elements 
and then store the remaining data in a database, you might use a single global 
window to process all the data elements together without breaking them up into 
smaller windows.



##########
learning/tour-of-beam/learning-content/windowing/fixed-time-window/description.md:
##########
@@ -0,0 +1,68 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Fixed time windows
+
+The simplest form of windowing is using fixed time windows: given a 
timestamped `PCollection` which might be continuously updating, each window 
might capture (for example) all elements with timestamps that fall into a 
30-second interval.
+
+A fixed time window represents a consistent duration, non overlapping time 
interval in the data stream. Consider windows with a 30-second duration: all 
the elements in your unbounded PCollection with timestamp values from 0:00:00 
up to (but not including) 0:00:30 belong to the first window, elements with 
timestamp values from 0:00:30 up to (but not including) 0:01:00 belong to the 
second window, and so on.
+
+{{if (eq .Sdk "go")}}
+```

Review Comment:
   Please add a description of how fixed-time window can be created in go.



##########
learning/tour-of-beam/learning-content/windowing/fixed-time-window/description.md:
##########
@@ -0,0 +1,68 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Fixed time windows
+
+The simplest form of windowing is using fixed time windows: given a 
timestamped `PCollection` which might be continuously updating, each window 
might capture (for example) all elements with timestamps that fall into a 
30-second interval.
+
+A fixed time window represents a consistent duration, non overlapping time 
interval in the data stream. Consider windows with a 30-second duration: all 
the elements in your unbounded PCollection with timestamp values from 0:00:00 
up to (but not including) 0:00:30 belong to the first window, elements with 
timestamp values from 0:00:30 up to (but not including) 0:01:00 belong to the 
second window, and so on.
+
+{{if (eq .Sdk "go")}}
+```
+fixedWindowedItems := beam.WindowInto(s,
+       window.NewFixedWindows(30*time.Second),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> items = ...;
+    PCollection<String> fixedWindowedItems = items.apply(
+        Window.<String>into(FixedWindows.of(Duration.standardSeconds(30))));
+```
+{{end}}
+
+{{if (eq .Sdk "python")}}
+```
+from apache_beam import window
+fixed_windowed_items = (
+    items | 'window' >> beam.WindowInto(window.FixedWindows(30)))
+```
+{{end}}
+
+### Playground exercise 
+

Review Comment:
   Please add the description of what a runnable example does out of the box. 
Such as:
   
   In the playground window, you can try an example of how to create a 
fixed-time window and print elements in it. 



##########
learning/tour-of-beam/learning-content/windowing/global-window/description.md:
##########
@@ -0,0 +1,59 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### The single global window
+
+By default, all data in a `PCollection` is assigned to the single global 
window, and late data is discarded. If your data set is of a fixed size, you 
can use the global window default for your `PCollection`.
+
+You can use the single global window if you are working with an unbounded data 
set (e.g. from a streaming data source) but use caution when applying 
aggregating transforms such as `GroupByKey` and `Combine`. The single global 
window with a default trigger generally requires the entire data set to be 
available before processing, which is not possible with continuously updating 
data. To perform aggregations on an unbounded `PCollection` that uses global 
windowing, you should specify a non-default trigger for that `PCollection`.
+
+If your `PCollection` is bounded (the size is fixed), you can assign all the 
elements to a single global window. The following example code shows how to set 
a single global window for a `PCollection`:
+
+{{if (eq .Sdk "go")}}
+```
+globalWindowedItems := beam.WindowInto(s,
+       window.NewGlobalWindows(),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```

Review Comment:
   Please add a short description of how a single global window could be 
created in this particular SDK. 



##########
learning/tour-of-beam/learning-content/windowing/fixed-time-window/description.md:
##########
@@ -0,0 +1,68 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Fixed time windows
+
+The simplest form of windowing is using fixed time windows: given a 
timestamped `PCollection` which might be continuously updating, each window 
might capture (for example) all elements with timestamps that fall into a 
30-second interval.
+
+A fixed time window represents a consistent duration, non overlapping time 
interval in the data stream. Consider windows with a 30-second duration: all 
the elements in your unbounded PCollection with timestamp values from 0:00:00 
up to (but not including) 0:00:30 belong to the first window, elements with 
timestamp values from 0:00:30 up to (but not including) 0:01:00 belong to the 
second window, and so on.
+
+{{if (eq .Sdk "go")}}
+```
+fixedWindowedItems := beam.WindowInto(s,
+       window.NewFixedWindows(30*time.Second),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> items = ...;
+    PCollection<String> fixedWindowedItems = items.apply(
+        Window.<String>into(FixedWindows.of(Duration.standardSeconds(30))));
+```
+{{end}}
+
+{{if (eq .Sdk "python")}}
+```
+from apache_beam import window
+fixed_windowed_items = (
+    items | 'window' >> beam.WindowInto(window.FixedWindows(30)))
+```
+{{end}}
+
+### Playground exercise 
+
+You can start displaying elements from the beginning but also from the end:

Review Comment:
   Not sure what the user is expected to do. 



##########
learning/tour-of-beam/learning-content/windowing/global-window/description.md:
##########
@@ -0,0 +1,59 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### The single global window
+
+By default, all data in a `PCollection` is assigned to the single global 
window, and late data is discarded. If your data set is of a fixed size, you 
can use the global window default for your `PCollection`.
+
+You can use the single global window if you are working with an unbounded data 
set (e.g. from a streaming data source) but use caution when applying 
aggregating transforms such as `GroupByKey` and `Combine`. The single global 
window with a default trigger generally requires the entire data set to be 
available before processing, which is not possible with continuously updating 
data. To perform aggregations on an unbounded `PCollection` that uses global 
windowing, you should specify a non-default trigger for that `PCollection`.
+
+If your `PCollection` is bounded (the size is fixed), you can assign all the 
elements to a single global window. The following example code shows how to set 
a single global window for a `PCollection`:
+
+{{if (eq .Sdk "go")}}
+```
+globalWindowedItems := beam.WindowInto(s,
+       window.NewGlobalWindows(),
+       items)
+```
+{{end}}
+
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> items = ...;
+PCollection<String> batchItems = items.apply(
+  Window.<String>into(new GlobalWindows()));
+```
+{{end}}
+
+{{if (eq .Sdk "python")}}
+```

Review Comment:
   Please add a short description of how a single global window could be 
created in this particular SDK. 



##########
learning/tour-of-beam/learning-content/windowing/adding-timestamp/description.md:
##########
@@ -0,0 +1,60 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Adding timestamps to a PCollection’s elements
+
+An unbounded source provides a timestamp for each element. Depending on your 
unbounded source, you may need to configure how the timestamp is extracted from 
the raw data stream.
+
+However, bounded sources (such as a file from TextIO) do not provide 
timestamps. If you need timestamps, you must add them to your PCollection’s 
elements.
+
+You can assign new timestamps to the elements of a PCollection by applying a 
ParDo transform that outputs new elements with timestamps that you set.
+
+An example might be if your pipeline reads log records from an input file, and 
each log record includes a timestamp field; since your pipeline reads the 
records in from a file, the file source doesn’t assign timestamps 
automatically. You can parse the timestamp field from each record and use a 
ParDo transform with a DoFn to attach the timestamps to each element in your 
`PCollection`.

Review Comment:
   An example might be if your pipeline reads log records from an input file, 
and each log record includes a timestamp field; since your pipeline reads the 
records from a file, the file source doesn’t assign timestamps automatically. 
Instead, you can parse the timestamp field from each record and use a ParDo 
transform with a DoFn to attach the timestamps to each element in your 
`PCollection`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] alxp1982 commented on a diff in pull request #25258: [Tour of Beam] Learning content for "Windowing" module

Reply via email to