kennknowles commented on a change in pull request #15778:
URL: https://github.com/apache/beam/pull/15778#discussion_r745036962



##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +373,76 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Window
+
+Windowing subdivides a `PCollection` into _windows_ according to the timestamps
+of its individual elements. Windows enable grouping operations over unbounded
+collections by dividing the collection into windows of finite collections. A
+windowing function tells the runner how to assign elements to an initial 
window,

Review comment:
       one or more initial windows (maybe even zero is possible but that is 
wonky)

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +373,76 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Window
+
+Windowing subdivides a `PCollection` into _windows_ according to the timestamps
+of its individual elements. Windows enable grouping operations over unbounded
+collections by dividing the collection into windows of finite collections. A
+windowing function tells the runner how to assign elements to an initial 
window,
+and how to merge windows of grouped elements. Two concepts are closely related
+to windowing: [watermarks](#watermark) and triggers.
+
+Transforms that aggregate multiple elements, such as `GroupByKey` and 
`Combine`,
+work implicitly on a per-window basis; they process each `PCollection` as a
+succession of multiple, finite windows, though the entire collection itself may
+be of unbounded size.
+
+Beam provides several windowing functions:
+
+ * **Fixed time windows** (also known as "tumbling windows") represent a 
consistent
+   duration, non overlapping time interval in the data stream.
+ * **Sliding time windows** (also known as "hopping windows") also represent 
time
+   intervals in the data stream; however, sliding time windows can overlap.
+ * **Per-session windows** define windows that contain elements that are 
within a
+   certain gap duration of another element.
+ * **Single global window**: by default, all data in a `PCollection` is 
assigned to
+   the single global window, and late data is discarded.
+ * **Calendar-based windows** (not supported by the Beam SDK for Python)
+
+You can also define your own windowing function if you have more complex
+requirements.
+
+For more information about windows, see the following page:
+
+ * [Beam Programming Guide: 
Windowing](/documentation/programming-guide/#windowing)
+
+### Watermark
+
+In any data processing system, there is a certain amount of lag between the 
time
+a data event occurs (the “event time”, determined by the timestamp on the data
+element itself) and the time the actual data element gets processed at any 
stage
+in your pipeline (the “processing time”, determined by the clock on the system
+processing the element). In addition, there are no guarantees that data events
+will appear in your pipeline in the same order that they were generated.
+
+For example, let’s say we have a PCollection that’s using fixed-time windowing,
+with windows that are five minutes long. For each window, Beam must collect all
+the data with an event time timestamp in the given window range (between 0:00
+and 4:59 in the first window, for instance). Data with timestamps outside that
+range (data from 5:00 or later) belong to a different window.
+
+However, data isn’t always guaranteed to arrive in a pipeline in time order, or

Review comment:
       down here you say "isn't always guaranteed" which seems a good phrasing

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +373,76 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Window
+
+Windowing subdivides a `PCollection` into _windows_ according to the timestamps
+of its individual elements. Windows enable grouping operations over unbounded
+collections by dividing the collection into windows of finite collections. A
+windowing function tells the runner how to assign elements to an initial 
window,
+and how to merge windows of grouped elements. Two concepts are closely related
+to windowing: [watermarks](#watermark) and triggers.
+
+Transforms that aggregate multiple elements, such as `GroupByKey` and 
`Combine`,
+work implicitly on a per-window basis; they process each `PCollection` as a
+succession of multiple, finite windows, though the entire collection itself may
+be of unbounded size.
+
+Beam provides several windowing functions:
+
+ * **Fixed time windows** (also known as "tumbling windows") represent a 
consistent
+   duration, non overlapping time interval in the data stream.
+ * **Sliding time windows** (also known as "hopping windows") also represent 
time
+   intervals in the data stream; however, sliding time windows can overlap.
+ * **Per-session windows** define windows that contain elements that are 
within a
+   certain gap duration of another element.
+ * **Single global window**: by default, all data in a `PCollection` is 
assigned to
+   the single global window, and late data is discarded.
+ * **Calendar-based windows** (not supported by the Beam SDK for Python)
+
+You can also define your own windowing function if you have more complex
+requirements.
+
+For more information about windows, see the following page:
+
+ * [Beam Programming Guide: 
Windowing](/documentation/programming-guide/#windowing)
+
+### Watermark
+
+In any data processing system, there is a certain amount of lag between the 
time
+a data event occurs (the “event time”, determined by the timestamp on the data
+element itself) and the time the actual data element gets processed at any 
stage
+in your pipeline (the “processing time”, determined by the clock on the system
+processing the element). In addition, there are no guarantees that data events

Review comment:
       The guarantees depend on the entirety of the system - sometimes there 
might be. Maybe "in many cases, data events can appear in your pipeline in a 
different order than they were generated". Could also be worth mentioning 
examples like intermediate systems that don't preserve order but also that data 
might come from different sources. If two independent servers are timestamping 
things and one has a better connection they won't be merged into an in-order 
stream.

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -42,6 +42,14 @@ understand an important set of core concepts:
    them to a runner.
  * [_Runner_](#runner) - A runner runs a Beam pipeline using the capabilities 
of
    your chosen data processing engine.
+ * [_Window_](#window) - A `PCollection` can be subdivided into windows based 
on

Review comment:
       Technically not always "subdivided" per se. But TBH close enough, if you 
consider sliding windows to duplicate elements and then subdivide that result 
of that. I don't know how pedantic you want to be. I started writing this 
comment thinking we should change it but now I think keeping it as-is might be 
the best concise way of helping someone understand what windows are and the 
purpose they serve.

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +373,76 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Window
+
+Windowing subdivides a `PCollection` into _windows_ according to the timestamps
+of its individual elements. Windows enable grouping operations over unbounded
+collections by dividing the collection into windows of finite collections. A
+windowing function tells the runner how to assign elements to an initial 
window,
+and how to merge windows of grouped elements. Two concepts are closely related
+to windowing: [watermarks](#watermark) and triggers.
+
+Transforms that aggregate multiple elements, such as `GroupByKey` and 
`Combine`,

Review comment:
       This is really really clear. Nice

##########
File path: website/www/site/content/en/documentation/basics.md
##########
@@ -365,6 +373,76 @@ For more information about runners, see the following 
pages:
  * [Choosing a Runner](/documentation/#choosing-a-runner)
  * [Beam Capability Matrix](/documentation/runners/capability-matrix/)
 
+### Window
+
+Windowing subdivides a `PCollection` into _windows_ according to the timestamps
+of its individual elements. Windows enable grouping operations over unbounded
+collections by dividing the collection into windows of finite collections. A
+windowing function tells the runner how to assign elements to an initial 
window,
+and how to merge windows of grouped elements. Two concepts are closely related
+to windowing: [watermarks](#watermark) and triggers.
+
+Transforms that aggregate multiple elements, such as `GroupByKey` and 
`Combine`,
+work implicitly on a per-window basis; they process each `PCollection` as a
+succession of multiple, finite windows, though the entire collection itself may
+be of unbounded size.
+
+Beam provides several windowing functions:
+
+ * **Fixed time windows** (also known as "tumbling windows") represent a 
consistent
+   duration, non overlapping time interval in the data stream.

Review comment:
       non-overlapping? (hyphen)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to