This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new a89b07d Publishing website 2021/12/07 06:03:36 at commit 6d63a70
a89b07d is described below
commit a89b07df119671d4dcb216913f33723d82798d96
Author: jenkins <[email protected]>
AuthorDate: Tue Dec 7 06:03:37 2021 +0000
Publishing website 2021/12/07 06:03:36 at commit 6d63a70
---
.../documentation/basics/index.html | 71 +++++++++++---
website/generated-content/documentation/index.xml | 107 ++++++++++++++++++---
.../documentation/programming-guide/index.html | 11 ++-
website/generated-content/sitemap.xml | 2 +-
4 files changed, 157 insertions(+), 34 deletions(-)
diff --git a/website/generated-content/documentation/basics/index.html
b/website/generated-content/documentation/basics/index.html
index 89478b1..d74214d 100644
--- a/website/generated-content/documentation/basics/index.html
+++ b/website/generated-content/documentation/basics/index.html
@@ -18,7 +18,7 @@
function addPlaceholder(){$('input:text').attr('placeholder',"What are you
looking for?");}
function endSearch(){var
search=document.querySelector(".searchBar");search.classList.add("disappear");var
icons=document.querySelector("#iconsBar");icons.classList.remove("disappear");}
function blockScroll(){$("body").toggleClass("fixedPosition");}
-function openMenu(){addPlaceholder();blockScroll();}</script><div
class="clearfix container-main-content"><div class="section-nav closed"
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list
data-section-nav><li><span
class=section-nav-list-main-title>Documentation</span></li><li><a
href=/documentation>Using the Documentation</a></li><li
class=section-nav-item--collapsible><span class=section-nav-lis [...]
+function openMenu(){addPlaceholder();blockScroll();}</script><div
class="clearfix container-main-content"><div class="section-nav closed"
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list
data-section-nav><li><span
class=section-nav-list-main-title>Documentation</span></li><li><a
href=/documentation>Using the Documentation</a></li><li
class=section-nav-item--collapsible><span class=section-nav-lis [...]
data-parallel processing pipelines. To get started with Beam, you’ll
need to
understand an important set of core concepts:</p><ul><li><a
href=#pipeline><em>Pipeline</em></a> - A pipeline is a user-constructed graph of
transformations that defines the desired data processing
operations.</li><li><a href=#pcollection><em>PCollection</em></a> - A
<code>PCollection</code> is a data set or data
@@ -33,7 +33,13 @@ a <code>PCollection</code>. The schema for a
<code>PCollection</code> defines el
<code>PCollection</code> as an ordered list of named fields.</li><li><a
href=/documentation/sdks/java/><em>SDK</em></a> - A language-specific library
that lets
pipeline authors build transforms, construct their pipelines, and submit
them to a runner.</li><li><a href=#runner><em>Runner</em></a> - A runner runs
a Beam pipeline using the capabilities of
-your chosen data processing engine.</li><li><a
href=#trigger><em>Trigger</em></a> - A trigger determines when to aggregate the
results of
+your chosen data processing engine.</li><li><a
href=#window><em>Window</em></a> - A <code>PCollection</code> can be subdivided
into windows based on
+the timestamps of the individual elements. Windows enable grouping operations
+over collections that grow over time by dividing the collection into windows
+of finite collections.</li><li><a href=#watermark><em>Watermark</em></a> - A
watermark is a guess as to when all data in a
+certain window is expected to have arrived. This is needed because data isn’t
+always guaranteed to arrive in a pipeline in time order, or to always arrive
+at predictable intervals.</li><li><a href=#trigger><em>Trigger</em></a> - A
trigger determines when to aggregate the results of
each window.</li><li><a href=#state-and-timers><em>State and timers</em></a> -
Per-key state and timer callbacks
are lower level primitives that give you full control over aggregating input
collections that grow over time.</li><li><a
href=#splittable-dofn><em>Splittable DoFn</em></a> - Splittable DoFns let you
process
@@ -95,18 +101,17 @@ responsible for providing initial timestamps. The runner
must propagate and
aggregate timestamps. If the timestamp is not important, such as with certain
batch processing jobs where elements do not denote events, the timestamp will
be
the minimum representable timestamp, often referred to colloquially as
“negative
-infinity”.</p><h4 id=watermarks>Watermarks</h4><p>Every
<code>PCollection</code> must have a watermark that estimates how complete the
-<code>PCollection</code> is.</p><p>The watermark is a guess that
“we’ll never see an element with an earlier
+infinity”.</p><h4 id=watermarks>Watermarks</h4><p>Every
<code>PCollection</code> must have a <a href=#watermark>watermark</a> that
estimates how
+complete the <code>PCollection</code> is.</p><p>The watermark is a guess that
“we’ll never see an element with an earlier
timestamp”. Data sources are responsible for producing a watermark. The
runner
must implement watermark propagation as PCollections are processed, merged, and
partitioned.</p><p>The contents of a <code>PCollection</code> are complete
when a watermark advances to
“infinity”. In this manner, you can discover that an unbounded
PCollection is
-finite.</p><h4 id=windowed-elements>Windowed elements</h4><p>Every element in
a <code>PCollection</code> resides in a window. No element resides in
-multiple windows; two elements can be equal except for their window, but they
-are not the same.</p><p>When elements are read from the outside world, they
arrive in the global window.
-When they are written to the outside world, they are effectively placed back
+finite.</p><h4 id=windowed-elements>Windowed elements</h4><p>Every element in
a <code>PCollection</code> resides in a <a href=#window>window</a>. No element
+resides in multiple windows; two elements can be equal except for their window,
+but they are not the same.</p><p>When elements are written to the outside
world, they are effectively placed back
into the global window. Transforms that write data and don’t take this
-perspective probably risks data loss.</p><p>A window has a maximum timestamp.
When the watermark exceeds the maximum
+perspective risk data loss.</p><p>A window has a maximum timestamp. When the
watermark exceeds the maximum
timestamp plus the user-specified allowed lateness, the window is expired. All
data related to an expired window might be discarded at any time.</p><h4
id=coder>Coder</h4><p>Every <code>PCollection</code> has a coder, which is a
specification of the binary format
of the elements.</p><p>In Beam, the user’s pipeline can be written in a
language other than the
@@ -150,8 +155,8 @@ the transform. For example, when using <code>ParDo</code>,
user-defined code spe
operation to apply to every element. For <code>Combine</code>, it specifies
how values
should be combined. By using <a
href=/documentation/patterns/cross-language/>cross-language transforms</a>,
a Beam pipeline can contain UDFs written in a different language, or even
-multiple languages in the same pipeline.</p><p>Beam has several varieties of
UDFs:</p><ul><li><a href=/programming-guide/#pardo><em>DoFn</em></a> -
per-element processing function (used
-in <code>ParDo</code>)</li><li><a
href=/programming-guide/#setting-your-pcollections-windowing-function><em>WindowFn</em></a>
-
+multiple languages in the same pipeline.</p><p>Beam has several varieties of
UDFs:</p><ul><li><a
href=/documentation/programming-guide/#pardo><em>DoFn</em></a> - per-element
processing
+function (used in <code>ParDo</code>)</li><li><a
href=/documentation/programming-guide/#setting-your-pcollections-windowing-function><em>WindowFn</em></a>
-
places elements in windows and merges windows (used in <code>Window</code> and
<code>GroupByKey</code>)</li><li><a
href=/documentation/programming-guide/#side-inputs><em>ViewFn</em></a> - adapts
a
materialized <code>PCollection</code> to a particular interface (used in side
inputs)</li><li><a
href=/documentation/programming-guide/#side-inputs-windowing><em>WindowMappingFn</em></a>
-
@@ -167,7 +172,7 @@ without communicating or sharing state with any of the
other copies. Each copy
of your user code function might be retried or run multiple times, depending on
the pipeline runner and the processing backend that you choose for your
pipeline. Beam also supports stateful processing through the
-<a href=/blog/stateful-processing/>stateful processing API</a>.</p><p>For more
information about user-defined functions, see the following
pages:</p><ul><li><a
href=/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms>Requirements
for writing user code for Beam transforms</a></li><li><a
href=/documentation/programming-guide/#pardo>Beam Programming Guide:
ParDo</a></li><li><a
href=/programming-guide/#setting-your-pcollections-windowing-function>Beam Pro
[...]
+<a href=/blog/stateful-processing/>stateful processing API</a>.</p><p>For more
information about user-defined functions, see the following
pages:</p><ul><li><a
href=/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms>Requirements
for writing user code for Beam transforms</a></li><li><a
href=/documentation/programming-guide/#pardo>Beam Programming Guide:
ParDo</a></li><li><a
href=/documentation/programming-guide/#setting-your-pcollections-windowing-fun
[...]
schema for a <code>PCollection</code> defines elements of that
<code>PCollection</code> as an ordered
list of named fields. Each field has a name, a type, and possibly a set of user
options.</p><p>In many cases, the element type in a <code>PCollection</code>
has a structure that can be
@@ -188,7 +193,45 @@ Flink runner translates a Beam pipeline into a Flink job.
The Direct Runner runs
pipelines locally so you can test, debug, and validate that your pipeline
adheres to the Apache Beam model as closely as possible.</p><p>For an
up-to-date list of Beam runners and which features of the Apache Beam
model they support, see the runner
-<a href=/documentation/runners/capability-matrix/>capability
matrix</a>.</p><p>For more information about runners, see the following
pages:</p><ul><li><a href=/documentation/#choosing-a-runner>Choosing a
Runner</a></li><li><a href=/documentation/runners/capability-matrix/>Beam
Capability Matrix</a></li></ul><h3 id=trigger>Trigger</h3><p>When collecting
and grouping data into windows, Beam uses <em>triggers</em> to
+<a href=/documentation/runners/capability-matrix/>capability
matrix</a>.</p><p>For more information about runners, see the following
pages:</p><ul><li><a href=/documentation/#choosing-a-runner>Choosing a
Runner</a></li><li><a href=/documentation/runners/capability-matrix/>Beam
Capability Matrix</a></li></ul><h3 id=window>Window</h3><p>Windowing subdivides
a <code>PCollection</code> into <em>windows</em> according to the timestamps
+of its individual elements. Windows enable grouping operations over unbounded
+collections by dividing the collection into windows of finite
collections.</p><p>A <em>windowing function</em> tells the runner how to assign
elements to one or more
+initial windows, and how to merge windows of grouped elements. Each element in
a
+<code>PCollection</code> can only be in one window, so if a windowing function
specifies
+multiple windows for an element, the element is conceptually duplicated into
+each of the windows and each element is identical except for its
window.</p><p>Transforms that aggregate multiple elements, such as
<code>GroupByKey</code> and <code>Combine</code>,
+work implicitly on a per-window basis; they process each
<code>PCollection</code> as a
+succession of multiple, finite windows, though the entire collection itself may
+be of unbounded size.</p><p>Beam provides several windowing
functions:</p><ul><li><strong>Fixed time windows</strong> (also known as
“tumbling windows”) represent a consistent
+duration, non-overlapping time interval in the data
stream.</li><li><strong>Sliding time windows</strong> (also known as
“hopping windows”) also represent time
+intervals in the data stream; however, sliding time windows can
overlap.</li><li><strong>Per-session windows</strong> define windows that
contain elements that are within a
+certain gap duration of another element.</li><li><strong>Single global
window</strong>: by default, all data in a <code>PCollection</code> is assigned
to
+the single global window, and late data is
discarded.</li><li><strong>Calendar-based windows</strong> (not supported by
the Beam SDK for Python)</li></ul><p>You can also define your own windowing
function if you have more complex
+requirements.</p><p>For example, let’s say we have a
<code>PCollection</code> that uses fixed-time windowing,
+with windows that are five minutes long. For each window, Beam must collect all
+the data with an event time timestamp in the given window range (between 0:00
+and 4:59 in the first window, for instance). Data with timestamps outside that
+range (data from 5:00 or later) belongs to a different window.</p><p>Two
concepts are closely related to windowing and covered in the following
+sections: <a href=#watermark>watermarks</a> and <a
href=#trigger>triggers</a>.</p><p>For more information about windows, see the
following page:</p><ul><li><a
href=/documentation/programming-guide/#windowing>Beam Programming Guide:
Windowing</a></li><li><a
href=/documentation/programming-guide/#setting-your-pcollections-windowing-function>Beam
Programming Guide: WindowFn</a></li></ul><h3 id=watermark>Watermark</h3><p>In
any data processing system, there is a certain amount of lag between [...]
+a data event occurs (the “event time”, determined by the timestamp on the data
+element itself) and the time the actual data element gets processed at any
stage
+in your pipeline (the “processing time”, determined by the clock on the system
+processing the element). In addition, data isn’t always guaranteed to arrive in
+a pipeline in time order, or to always arrive at predictable intervals. For
+example, you might have intermediate systems that don’t preserve order,
or you
+might have two servers that timestamp data but one has a better network
+connection.</p><p>To address this potential unpredictability, Beam tracks a
<em>watermark</em>. A
+watermark is a guess as to when all data in a certain window is expected to
have
+arrived in the pipeline. You can also think of this as “we’ll never see an
+element with an earlier timestamp”.</p><p>Data sources are responsible for
producing a watermark, and every <code>PCollection</code>
+must have a watermark that estimates how complete the <code>PCollection</code>
is. The
+contents of a <code>PCollection</code> are complete when a watermark advances
to
+“infinity”. In this manner, you might discover that an unbounded
<code>PCollection</code>
+is finite. After the watermark progresses past the end of a window, any further
+element that arrives with a timestamp in that window is considered <em>late
data</em>.</p><p>Triggers are a related concept that allow you to modify and
refine the windowing
+strategy for a <code>PCollection</code>. You can use triggers to decide when
each
+individual window aggregates and reports its results, including how the window
+emits late elements.</p><p>For more information about watermarks, see the
following page:</p><ul><li><a
href=/documentation/programming-guide/#watermarks-and-late-data>Beam
Programming Guide: Watermarks and late data</a></li></ul><h3
id=trigger>Trigger</h3><p>When collecting and grouping data into windows, Beam
uses <em>triggers</em> to
determine when to emit the aggregated results of each window (referred to as a
<em>pane</em>). If you use Beam’s default windowing configuration and default
trigger,
Beam outputs the aggregated result when it estimates all data has arrived, and
@@ -259,7 +302,7 @@ checkpoint the sub-element and the runner repeats step
2.</li></ol><p>You can al
processing. For example, if you write a splittable <code>DoFn</code> to watch
a set of
directories and output filenames as they arrive, you can split to subdivide the
work of different directories. This allows the runner to split off a hot
-directory and give it additional resources.</p><p>For more information about
Splittable <code>DoFn</code>, see the following pages:</p><ul><li><a
href=/documentation/programming-guide/#splittable-dofns>Splittable
DoFns</a></li><li><a href=/blog/splittable-do-fn-is-available/>Splittable DoFn
in Apache Beam is Ready to Use</a></li></ul><div class=feedback><p
class=update>Last updated on 2021/10/21</p><h3>Have you found everything you
were looking for?</h3><p class=description>Was it all us [...]
+directory and give it additional resources.</p><p>For more information about
Splittable <code>DoFn</code>, see the following pages:</p><ul><li><a
href=/documentation/programming-guide/#splittable-dofns>Splittable
DoFns</a></li><li><a href=/blog/splittable-do-fn-is-available/>Splittable DoFn
in Apache Beam is Ready to Use</a></li></ul><div class=feedback><p
class=update>Last updated on 2021/12/06</p><h3>Have you found everything you
were looking for?</h3><p class=description>Was it all us [...]
<a href=http://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></div></div></div></footer></body></html>
\ No newline at end of file
diff --git a/website/generated-content/documentation/index.xml
b/website/generated-content/documentation/index.xml
index fc279a0..53c27a4 100644
--- a/website/generated-content/documentation/index.xml
+++ b/website/generated-content/documentation/index.xml
@@ -3205,6 +3205,14 @@ pipeline authors build transforms, construct their
pipelines, and submit
them to a runner.</li>
<li><a href="#runner"><em>Runner</em></a> - A runner runs a
Beam pipeline using the capabilities of
your chosen data processing engine.</li>
+<li><a href="#window"><em>Window</em></a> - A
<code>PCollection</code> can be subdivided into windows based on
+the timestamps of the individual elements. Windows enable grouping operations
+over collections that grow over time by dividing the collection into windows
+of finite collections.</li>
+<li><a href="#watermark"><em>Watermark</em></a> - A watermark
is a guess as to when all data in a
+certain window is expected to have arrived. This is needed because data isn’t
+always guaranteed to arrive in a pipeline in time order, or to always arrive
+at predictable intervals.</li>
<li><a href="#trigger"><em>Trigger</em></a> - A trigger
determines when to aggregate the results of
each window.</li>
<li><a href="#state-and-timers"><em>State and timers</em></a> -
Per-key state and timer callbacks
@@ -3316,8 +3324,8 @@ batch processing jobs where elements do not denote
events, the timestamp will be
the minimum representable timestamp, often referred to colloquially as
&ldquo;negative
infinity&rdquo;.</p>
<h4 id="watermarks">Watermarks</h4>
-<p>Every <code>PCollection</code> must have a watermark that
estimates how complete the
-<code>PCollection</code> is.</p>
+<p>Every <code>PCollection</code> must have a <a
href="#watermark">watermark</a> that estimates how
+complete the <code>PCollection</code> is.</p>
<p>The watermark is a guess that &ldquo;we&rsquo;ll never see an
element with an earlier
timestamp&rdquo;. Data sources are responsible for producing a watermark.
The runner
must implement watermark propagation as PCollections are processed, merged, and
@@ -3326,13 +3334,12 @@ partitioned.</p>
&ldquo;infinity&rdquo;. In this manner, you can discover that an
unbounded PCollection is
finite.</p>
<h4 id="windowed-elements">Windowed elements</h4>
-<p>Every element in a <code>PCollection</code> resides in a window.
No element resides in
-multiple windows; two elements can be equal except for their window, but they
-are not the same.</p>
-<p>When elements are read from the outside world, they arrive in the global
window.
-When they are written to the outside world, they are effectively placed back
+<p>Every element in a <code>PCollection</code> resides in a <a
href="#window">window</a>. No element
+resides in multiple windows; two elements can be equal except for their window,
+but they are not the same.</p>
+<p>When elements are written to the outside world, they are effectively
placed back
into the global window. Transforms that write data and don&rsquo;t take
this
-perspective probably risks data loss.</p>
+perspective risk data loss.</p>
<p>A window has a maximum timestamp. When the watermark exceeds the maximum
timestamp plus the user-specified allowed lateness, the window is expired. All
data related to an expired window might be discarded at any time.</p>
@@ -3410,9 +3417,9 @@ a Beam pipeline can contain UDFs written in a different
language, or even
multiple languages in the same pipeline.</p>
<p>Beam has several varieties of UDFs:</p>
<ul>
-<li><a href="/programming-guide/#pardo"><em>DoFn</em></a> -
per-element processing function (used
-in <code>ParDo</code>)</li>
-<li><a
href="/programming-guide/#setting-your-pcollections-windowing-function"><em>WindowFn</em></a>
-
+<li><a
href="/documentation/programming-guide/#pardo"><em>DoFn</em></a> -
per-element processing
+function (used in <code>ParDo</code>)</li>
+<li><a
href="/documentation/programming-guide/#setting-your-pcollections-windowing-function"><em>WindowFn</em></a>
-
places elements in windows and merges windows (used in
<code>Window</code> and
<code>GroupByKey</code>)</li>
<li><a
href="/documentation/programming-guide/#side-inputs"><em>ViewFn</em></a>
- adapts a
@@ -3439,7 +3446,7 @@ pipeline. Beam also supports stateful processing through
the
<ul>
<li><a
href="/documentation/programming-guide/#requirements-for-writing-user-code-for-beam-transforms">Requirements
for writing user code for Beam transforms</a></li>
<li><a href="/documentation/programming-guide/#pardo">Beam Programming
Guide: ParDo</a></li>
-<li><a
href="/programming-guide/#setting-your-pcollections-windowing-function">Beam
Programming Guide: WindowFn</a></li>
+<li><a
href="/documentation/programming-guide/#setting-your-pcollections-windowing-function">Beam
Programming Guide: WindowFn</a></li>
<li><a href="/documentation/programming-guide/#combine">Beam Programming
Guide: CombineFn</a></li>
<li><a
href="/documentation/programming-guide/#data-encoding-and-type-safety">Beam
Programming Guide: Coder</a></li>
<li><a href="/documentation/programming-guide/#side-inputs">Beam
Programming Guide: Side inputs</a></li>
@@ -3482,6 +3489,73 @@ model they support, see the runner
<li><a href="/documentation/#choosing-a-runner">Choosing a
Runner</a></li>
<li><a href="/documentation/runners/capability-matrix/">Beam Capability
Matrix</a></li>
</ul>
+<h3 id="window">Window</h3>
+<p>Windowing subdivides a <code>PCollection</code> into
<em>windows</em> according to the timestamps
+of its individual elements. Windows enable grouping operations over unbounded
+collections by dividing the collection into windows of finite
collections.</p>
+<p>A <em>windowing function</em> tells the runner how to assign
elements to one or more
+initial windows, and how to merge windows of grouped elements. Each element in
a
+<code>PCollection</code> can only be in one window, so if a windowing
function specifies
+multiple windows for an element, the element is conceptually duplicated into
+each of the windows and each element is identical except for its window.</p>
+<p>Transforms that aggregate multiple elements, such as
<code>GroupByKey</code> and <code>Combine</code>,
+work implicitly on a per-window basis; they process each
<code>PCollection</code> as a
+succession of multiple, finite windows, though the entire collection itself may
+be of unbounded size.</p>
+<p>Beam provides several windowing functions:</p>
+<ul>
+<li><strong>Fixed time windows</strong> (also known as
&ldquo;tumbling windows&rdquo;) represent a consistent
+duration, non-overlapping time interval in the data stream.</li>
+<li><strong>Sliding time windows</strong> (also known as
&ldquo;hopping windows&rdquo;) also represent time
+intervals in the data stream; however, sliding time windows can
overlap.</li>
+<li><strong>Per-session windows</strong> define windows that contain
elements that are within a
+certain gap duration of another element.</li>
+<li><strong>Single global window</strong>: by default, all data in a
<code>PCollection</code> is assigned to
+the single global window, and late data is discarded.</li>
+<li><strong>Calendar-based windows</strong> (not supported by the
Beam SDK for Python)</li>
+</ul>
+<p>You can also define your own windowing function if you have more complex
+requirements.</p>
+<p>For example, let&rsquo;s say we have a
<code>PCollection</code> that uses fixed-time windowing,
+with windows that are five minutes long. For each window, Beam must collect all
+the data with an event time timestamp in the given window range (between 0:00
+and 4:59 in the first window, for instance). Data with timestamps outside that
+range (data from 5:00 or later) belongs to a different window.</p>
+<p>Two concepts are closely related to windowing and covered in the
following
+sections: <a href="#watermark">watermarks</a> and <a
href="#trigger">triggers</a>.</p>
+<p>For more information about windows, see the following page:</p>
+<ul>
+<li><a href="/documentation/programming-guide/#windowing">Beam
Programming Guide: Windowing</a></li>
+<li><a
href="/documentation/programming-guide/#setting-your-pcollections-windowing-function">Beam
Programming Guide: WindowFn</a></li>
+</ul>
+<h3 id="watermark">Watermark</h3>
+<p>In any data processing system, there is a certain amount of lag between
the time
+a data event occurs (the “event time”, determined by the timestamp on the data
+element itself) and the time the actual data element gets processed at any
stage
+in your pipeline (the “processing time”, determined by the clock on the system
+processing the element). In addition, data isn’t always guaranteed to arrive in
+a pipeline in time order, or to always arrive at predictable intervals. For
+example, you might have intermediate systems that don&rsquo;t preserve
order, or you
+might have two servers that timestamp data but one has a better network
+connection.</p>
+<p>To address this potential unpredictability, Beam tracks a
<em>watermark</em>. A
+watermark is a guess as to when all data in a certain window is expected to
have
+arrived in the pipeline. You can also think of this as “we’ll never see an
+element with an earlier timestamp”.</p>
+<p>Data sources are responsible for producing a watermark, and every
<code>PCollection</code>
+must have a watermark that estimates how complete the
<code>PCollection</code> is. The
+contents of a <code>PCollection</code> are complete when a watermark
advances to
+“infinity”. In this manner, you might discover that an unbounded
<code>PCollection</code>
+is finite. After the watermark progresses past the end of a window, any further
+element that arrives with a timestamp in that window is considered <em>late
data</em>.</p>
+<p>Triggers are a related concept that allow you to modify and refine the
windowing
+strategy for a <code>PCollection</code>. You can use triggers to decide
when each
+individual window aggregates and reports its results, including how the window
+emits late elements.</p>
+<p>For more information about watermarks, see the following page:</p>
+<ul>
+<li><a
href="/documentation/programming-guide/#watermarks-and-late-data">Beam
Programming Guide: Watermarks and late data</a></li>
+</ul>
<h3 id="trigger">Trigger</h3>
<p>When collecting and grouping data into windows, Beam uses
<em>triggers</em> to
determine when to emit the aggregated results of each window (referred to as a
@@ -8893,9 +8967,12 @@ window.</p>
</ul>
<p>You can also define your own <code>WindowFn</code> if you have a
more complex need.</p>
<p>Note that each element can logically belong to more than one window,
depending
-on the windowing function you use. Sliding time windowing, for example, creates
-overlapping windows wherein a single element can be assigned to multiple
-windows.</p>
+on the windowing function you use. Sliding time windowing, for example, can
+create overlapping windows wherein a single element can be assigned to multiple
+windows. However, each element in a <code>PCollection</code> can only be
in one window, so
+if an element is assigned to multiple windows, the element is conceptually
+duplicated into each of the windows and each element is identical except for
its
+window.</p>
<h4 id="fixed-time-windows">8.2.1. Fixed time windows</h4>
<p>The simplest form of windowing is using <strong>fixed time
windows</strong>: given a
timestamped <code>PCollection</code> which might be continuously
updating, each window
diff --git
a/website/generated-content/documentation/programming-guide/index.html
b/website/generated-content/documentation/programming-guide/index.html
index 1ef8286..82acce2 100644
--- a/website/generated-content/documentation/programming-guide/index.html
+++ b/website/generated-content/documentation/programming-guide/index.html
@@ -2558,9 +2558,12 @@ for that <code>PCollection</code>. The
<code>GroupByKey</code> transform groups
subsequent <code>ParDo</code> transform gets applied multiple times per key,
once for each
window.</p><h3 id=provided-windowing-functions>8.2. Provided windowing
functions</h3><p>You can define different kinds of windows to divide the
elements of your
<code>PCollection</code>. Beam provides several windowing functions,
including:</p><ul><li>Fixed Time Windows</li><li>Sliding Time
Windows</li><li>Per-Session Windows</li><li>Single Global
Window</li><li>Calendar-based Windows (not supported by the Beam SDK for Python
or Go)</li></ul><p>You can also define your own <code>WindowFn</code> if you
have a more complex need.</p><p>Note that each element can logically belong to
more than one window, depending
-on the windowing function you use. Sliding time windowing, for example, creates
-overlapping windows wherein a single element can be assigned to multiple
-windows.</p><h4 id=fixed-time-windows>8.2.1. Fixed time windows</h4><p>The
simplest form of windowing is using <strong>fixed time windows</strong>: given a
+on the windowing function you use. Sliding time windowing, for example, can
+create overlapping windows wherein a single element can be assigned to multiple
+windows. However, each element in a <code>PCollection</code> can only be in
one window, so
+if an element is assigned to multiple windows, the element is conceptually
+duplicated into each of the windows and each element is identical except for
its
+window.</p><h4 id=fixed-time-windows>8.2.1. Fixed time windows</h4><p>The
simplest form of windowing is using <strong>fixed time windows</strong>: given a
timestamped <code>PCollection</code> which might be continuously updating,
each window
might capture (for example) all elements with timestamps that fall into a 30
second interval.</p><p>A fixed time window represents a consistent duration,
non overlapping time
@@ -4307,7 +4310,7 @@ expansionAddr := "localhost:8097"
outT := beam.UnnamedOutput(typex.New(reflectx.String))
res := beam.CrossLanguage(s, urn, payload, expansionAddr,
beam.UnnamedInput(inputPCol), outT)
</code></pre></div></div></li><li><p>After the job has been submitted to
the Beam runner, shutdown the expansion service by
-terminating the expansion service process.</p></li></ol><h3
id=x-lang-transform-runner-support>13.3. Runner Support</h3><p>Currently,
portable runners such as Flink, Spark, and the Direct runner can be used with
multi-language pipelines.</p><p>Google Cloud Dataflow supports multi-language
pipelines through the Dataflow Runner v2 backend architecture.</p><div
class=feedback><p class=update>Last updated on 2021/11/18</p><h3>Have you found
everything you were looking for?</h3><p class=descr [...]
+terminating the expansion service process.</p></li></ol><h3
id=x-lang-transform-runner-support>13.3. Runner Support</h3><p>Currently,
portable runners such as Flink, Spark, and the Direct runner can be used with
multi-language pipelines.</p><p>Google Cloud Dataflow supports multi-language
pipelines through the Dataflow Runner v2 backend architecture.</p><div
class=feedback><p class=update>Last updated on 2021/12/06</p><h3>Have you found
everything you were looking for?</h3><p class=descr [...]
<a href=http://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></div></div></div></footer></body></html>
\ No newline at end of file
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index 78bbfff..5047cde 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.34.0/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-12-01T21:32:04+03:00</lastmod></url><url><loc>/blog/g
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.34.0/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-12-01T21:32:04+03:00</lastmod></url><url><loc>/blog/g
[...]
\ No newline at end of file