This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new b5fbe9e Publishing website 2021/10/27 00:01:45 at commit 5cb634e
b5fbe9e is described below
commit b5fbe9e0dc7f19c65d536b81f83089af018a322f
Author: jenkins <[email protected]>
AuthorDate: Wed Oct 27 00:01:46 2021 +0000
Publishing website 2021/10/27 00:01:45 at commit 5cb634e
---
.../documentation/basics/index.html | 26 +++++++++++++--
website/generated-content/documentation/index.xml | 38 ++++++++++++++++++++++
website/generated-content/sitemap.xml | 2 +-
3 files changed, 62 insertions(+), 4 deletions(-)
diff --git a/website/generated-content/documentation/basics/index.html
b/website/generated-content/documentation/basics/index.html
index 14a033f..8dc0acd 100644
--- a/website/generated-content/documentation/basics/index.html
+++ b/website/generated-content/documentation/basics/index.html
@@ -18,7 +18,7 @@
function addPlaceholder(){$('input:text').attr('placeholder',"What are you
looking for?");}
function endSearch(){var
search=document.querySelector(".searchBar");search.classList.add("disappear");var
icons=document.querySelector("#iconsBar");icons.classList.remove("disappear");}
function blockScroll(){$("body").toggleClass("fixedPosition");}
-function openMenu(){addPlaceholder();blockScroll();}</script><div
class="clearfix container-main-content"><div class="section-nav closed"
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list
data-section-nav><li><span
class=section-nav-list-main-title>Documentation</span></li><li><a
href=/documentation>Using the Documentation</a></li><li
class=section-nav-item--collapsible><span class=section-nav-lis [...]
+function openMenu(){addPlaceholder();blockScroll();}</script><div
class="clearfix container-main-content"><div class="section-nav closed"
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list
data-section-nav><li><span
class=section-nav-list-main-title>Documentation</span></li><li><a
href=/documentation>Using the Documentation</a></li><li
class=section-nav-item--collapsible><span class=section-nav-lis [...]
data-parallel processing pipelines. To get started with Beam, you’ll
need to
understand an important set of core concepts:</p><ul><li><a
href=#pipeline><em>Pipeline</em></a> - A pipeline is a user-constructed graph of
transformations that defines the desired data processing
operations.</li><li><a href=#pcollection><em>PCollection</em></a> - A
<code>PCollection</code> is a data set or data
@@ -33,7 +33,10 @@ a <code>PCollection</code>. The schema for a
<code>PCollection</code> defines el
<code>PCollection</code> as an ordered list of named fields.</li><li><a
href=/documentation/sdks/java/><em>SDK</em></a> - A language-specific library
that lets
pipeline authors build transforms, construct their pipelines, and submit
them to a runner.</li><li><a href=#runner><em>Runner</em></a> - A runner runs
a Beam pipeline using the capabilities of
-your chosen data processing engine.</li></ul><p>The following sections cover
these concepts in more detail and provide links to
+your chosen data processing engine.</li><li><a
href=#splittable-dofn><em>Splittable DoFn</em></a> - Splittable DoFns let you
process
+elements in a non-monolithic way. You can checkpoint the processing of an
+element, and the runner can split the remaining work to yield additional
+parallelism.</li></ul><p>The following sections cover these concepts in more
detail and provide links to
additional documentation.</p><h3 id=pipeline>Pipeline</h3><p>A Beam pipeline
is a graph (specifically, a
<a href=https://en.wikipedia.org/wiki/Directed_acyclic_graph>directed acyclic
graph</a>)
of all the data and computations in your data processing task. This includes
@@ -182,7 +185,24 @@ Flink runner translates a Beam pipeline into a Flink job.
The Direct Runner runs
pipelines locally so you can test, debug, and validate that your pipeline
adheres to the Apache Beam model as closely as possible.</p><p>For an
up-to-date list of Beam runners and which features of the Apache Beam
model they support, see the runner
-<a href=/documentation/runners/capability-matrix/>capability
matrix</a>.</p><p>For more information about runners, see the following
pages:</p><ul><li><a href=/documentation/#choosing-a-runner>Choosing a
Runner</a></li><li><a href=/documentation/runners/capability-matrix/>Beam
Capability Matrix</a></li></ul><div class=feedback><p class=update>Last updated
on 2021/10/25</p><h3>Have you found everything you were looking for?</h3><p
class=description>Was it all useful and clear? Is there an [...]
+<a href=/documentation/runners/capability-matrix/>capability
matrix</a>.</p><p>For more information about runners, see the following
pages:</p><ul><li><a href=/documentation/#choosing-a-runner>Choosing a
Runner</a></li><li><a href=/documentation/runners/capability-matrix/>Beam
Capability Matrix</a></li></ul><h3 id=splittable-dofn>Splittable
DoFn</h3><p>Splittable <code>DoFn</code> (SDF) is a generalization of
<code>DoFn</code> that lets you process
+elements in a non-monolithic way. Splittable <code>DoFn</code> makes it easier
to create
+complex, modular I/O connectors in Beam.</p><p>A regular <code>ParDo</code>
processes an entire element at a time, applying your regular
+<code>DoFn</code> and waiting for the call to terminate. When you instead
apply a
+splittable <code>DoFn</code> to each element, the runner has the option of
splitting the
+element’s processing into smaller tasks. You can checkpoint the
processing of an
+element, and you can split the remaining work to yield additional
parallelism.</p><p>For example, imagine you want to read every line from very
large text files.
+When you write your splittable <code>DoFn</code>, you can have separate pieces
of logic to
+read a segment of a file, split a segment of a file into sub-segments, and
+report progress through the current segment. The runner can then invoke your
+splittable <code>DoFn</code> intelligently to split up each input and read
portions
+separately, in parallel.</p><p>A common computation pattern has the following
steps:</p><ol><li>The runner splits an incoming element before starting any
processing.</li><li>The runner starts running your processing logic on each
sub-element.</li><li>If the runner notices that some sub-elements are taking
longer than others,
+the runner splits those sub-elements further and repeats step 2.</li><li>The
sub-element either finishes processing, or the user chooses to
+checkpoint the sub-element and the runner repeats step 2.</li></ol><p>You can
also write your splittable <code>DoFn</code> so the runner can split the
unbounded
+processing. For example, if you write a splittable <code>DoFn</code> to watch
a set of
+directories and output filenames as they arrive, you can split to subdivide the
+work of different directories. This allows the runner to split off a hot
+directory and give it additional resources.</p><p>For more information about
Splittable <code>DoFn</code>, see the following pages:</p><ul><li><a
href=/documentation/programming-guide/#splittable-dofns>Splittable
DoFns</a></li><li><a href=/blog/splittable-do-fn-is-available/>Splittable DoFn
in Apache Beam is Ready to Use</a></li></ul><div class=feedback><p
class=update>Last updated on 2021/10/26</p><h3>Have you found everything you
were looking for?</h3><p class=description>Was it all us [...]
<a href=http://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></div></div></div></footer></body></html>
\ No newline at end of file
diff --git a/website/generated-content/documentation/index.xml
b/website/generated-content/documentation/index.xml
index 69e5ee9..3fd4cca 100644
--- a/website/generated-content/documentation/index.xml
+++ b/website/generated-content/documentation/index.xml
@@ -3205,6 +3205,10 @@ pipeline authors build transforms, construct their
pipelines, and submit
them to a runner.</li>
<li><a href="#runner"><em>Runner</em></a> - A runner runs a
Beam pipeline using the capabilities of
your chosen data processing engine.</li>
+<li><a href="#splittable-dofn"><em>Splittable DoFn</em></a> -
Splittable DoFns let you process
+elements in a non-monolithic way. You can checkpoint the processing of an
+element, and the runner can split the remaining work to yield additional
+parallelism.</li>
</ul>
<p>The following sections cover these concepts in more detail and provide
links to
additional documentation.</p>
@@ -3472,6 +3476,40 @@ model they support, see the runner
<ul>
<li><a href="/documentation/#choosing-a-runner">Choosing a
Runner</a></li>
<li><a href="/documentation/runners/capability-matrix/">Beam Capability
Matrix</a></li>
+</ul>
+<h3 id="splittable-dofn">Splittable DoFn</h3>
+<p>Splittable <code>DoFn</code> (SDF) is a generalization of
<code>DoFn</code> that lets you process
+elements in a non-monolithic way. Splittable <code>DoFn</code> makes it
easier to create
+complex, modular I/O connectors in Beam.</p>
+<p>A regular <code>ParDo</code> processes an entire element at a
time, applying your regular
+<code>DoFn</code> and waiting for the call to terminate. When you
instead apply a
+splittable <code>DoFn</code> to each element, the runner has the option
of splitting the
+element&rsquo;s processing into smaller tasks. You can checkpoint the
processing of an
+element, and you can split the remaining work to yield additional
parallelism.</p>
+<p>For example, imagine you want to read every line from very large text
files.
+When you write your splittable <code>DoFn</code>, you can have separate
pieces of logic to
+read a segment of a file, split a segment of a file into sub-segments, and
+report progress through the current segment. The runner can then invoke your
+splittable <code>DoFn</code> intelligently to split up each input and
read portions
+separately, in parallel.</p>
+<p>A common computation pattern has the following steps:</p>
+<ol>
+<li>The runner splits an incoming element before starting any
processing.</li>
+<li>The runner starts running your processing logic on each
sub-element.</li>
+<li>If the runner notices that some sub-elements are taking longer than
others,
+the runner splits those sub-elements further and repeats step 2.</li>
+<li>The sub-element either finishes processing, or the user chooses to
+checkpoint the sub-element and the runner repeats step 2.</li>
+</ol>
+<p>You can also write your splittable <code>DoFn</code> so the runner
can split the unbounded
+processing. For example, if you write a splittable <code>DoFn</code> to
watch a set of
+directories and output filenames as they arrive, you can split to subdivide the
+work of different directories. This allows the runner to split off a hot
+directory and give it additional resources.</p>
+<p>For more information about Splittable <code>DoFn</code>, see the
following pages:</p>
+<ul>
+<li><a
href="/documentation/programming-guide/#splittable-dofns">Splittable
DoFns</a></li>
+<li><a href="/blog/splittable-do-fn-is-available/">Splittable DoFn in
Apache Beam is Ready to Use</a></li>
</ul></description></item><item><title>Documentation: Beam
glossary</title><link>/documentation/glossary/</link><pubDate>Mon, 01 Jan 0001
00:00:00 +0000</pubDate><guid>/documentation/glossary/</guid><description>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index 32613c6..b570578 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b
[...]
\ No newline at end of file