This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 3a202e7e536 Publishing website 2022/04/08 22:17:40 at commit 2d27f44
3a202e7e536 is described below
commit 3a202e7e5364a61f3c1b9cf7b9f05acdc838de76
Author: jenkins <[email protected]>
AuthorDate: Fri Apr 8 22:17:41 2022 +0000
Publishing website 2022/04/08 22:17:40 at commit 2d27f44
---
.../get-started/from-spark/index.html | 26 +++++++++++++---------
website/generated-content/get-started/index.xml | 26 +++++++++++++---------
website/generated-content/sitemap.xml | 2 +-
3 files changed, 33 insertions(+), 21 deletions(-)
diff --git a/website/generated-content/get-started/from-spark/index.html
b/website/generated-content/get-started/from-spark/index.html
index 14a5fadc834..659e1951f3a 100644
--- a/website/generated-content/get-started/from-spark/index.html
+++ b/website/generated-content/get-started/from-spark/index.html
@@ -19,8 +19,8 @@ function
addPlaceholder(){$('input:text').attr('placeholder',"What are you looki
function endSearch(){var
search=document.querySelector(".searchBar");search.classList.add("disappear");var
icons=document.querySelector("#iconsBar");icons.classList.remove("disappear");}
function blockScroll(){$("body").toggleClass("fixedPosition");}
function openMenu(){addPlaceholder();blockScroll();}</script><div
class="clearfix container-main-content"><div class="section-nav closed"
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list
data-section-nav><li><span class=section-nav-list-main-title>Get
started</span></li><li><a href=/get-started/beam-overview/>Beam
Overview</a></li><li><a href=/get-started/tour-of-beam/>Tour of
Beam</a></li><li><s [...]
-learning <em>Apache Beam</em> is familiar.
-The Beam and Spark APIs are similar, so you already know the basic
concepts.</p><p>Spark stores data <em>Spark DataFrames</em> for structured data,
+using Beam should be easy.
+The basic concepts are the same, and the APIs are similar as well.</p><p>Spark
stores data <em>Spark DataFrames</em> for structured data,
and in <em>Resilient Distributed Datasets</em> (RDD) for unstructured data.
We are using RDDs for this guide.</p><p>A Spark RDD represents a collection of
elements,
while in Beam it’s called a <em>Parallel Collection</em> (PCollection).
@@ -46,7 +46,8 @@ methods like <code>data.map(...)</code>, but they’re
doing the same thing.
<span class=o>|</span> <span class=n>beam</span><span
class=o>.</span><span class=n>Map</span><span class=p>(</span><span
class=k>print</span><span class=p>)</span>
<span class=p>)</span></code></pre></div></div></div><blockquote><p>ℹ️
Note that we called <code>print</code> inside a <code>Map</code> transform.
That’s because we can only access the elements of a PCollection
-from within a PTransform.</p></blockquote><p>Another thing to note is that
Beam pipelines are constructed lazily.
+from within a PTransform.
+To inspect the data locally, you can use the <a
href=https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#creating_your_pipeline>InteractiveRunner</a></p></blockquote><p>Another
thing to note is that Beam pipelines are constructed lazily.
This means that when you pipe <code>|</code> data you’re only declaring
the
transformations and the order you want them to happen,
but the actual computation doesn’t happen.
@@ -74,10 +75,11 @@ we can’t guarantee that the results we’ve
calculated are available a
<span class=n>sc</span> <span class=o>=</span> <span
class=n>pyspark</span><span class=o>.</span><span
class=n>SparkContext</span><span class=p>()</span>
<span class=n>values</span> <span class=o>=</span> <span
class=n>sc</span><span class=o>.</span><span class=n>parallelize</span><span
class=p>([</span><span class=mi>1</span><span class=p>,</span> <span
class=mi>2</span><span class=p>,</span> <span class=mi>3</span><span
class=p>,</span> <span class=mi>4</span><span class=p>])</span>
-<span class=n>total</span> <span class=o>=</span> <span
class=n>values</span><span class=o>.</span><span class=n>reduce</span><span
class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span
class=p>,</span> <span class=n>y</span><span class=p>:</span> <span
class=n>x</span> <span class=o>+</span> <span class=n>y</span><span
class=p>)</span>
+<span class=n>min_value</span> <span class=o>=</span> <span
class=n>values</span><span class=o>.</span><span class=n>reduce</span><span
class=p>(</span><span class=nb>min</span><span class=p>)</span>
+<span class=n>max_value</span> <span class=o>=</span> <span
class=n>values</span><span class=o>.</span><span class=n>reduce</span><span
class=p>(</span><span class=nb>max</span><span class=p>)</span>
-<span class=c1># We can simply use `total` since it's already a Python
`int` value from `reduce`.</span>
-<span class=n>scaled_values</span> <span class=o>=</span> <span
class=n>values</span><span class=o>.</span><span class=n>map</span><span
class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span
class=p>:</span> <span class=n>x</span> <span class=o>/</span> <span
class=n>total</span><span class=p>)</span>
+<span class=c1># We can simply use `min_value` and `max_value` since it's
already a Python `int` value from `reduce`.</span>
+<span class=n>scaled_values</span> <span class=o>=</span> <span
class=n>values</span><span class=o>.</span><span class=n>map</span><span
class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span
class=p>:</span> <span class=p>(</span><span class=n>x</span> <span
class=o>-</span> <span class=n>min_value</span><span class=p>)</span> <span
class=o>/</span> <span class=p>(</span><span class=n>max_value</span> <span
class=o>-</span> <span class=n>min_value</span><span class=p>))</span>
<span class=c1># But to access `scaled_values`, we need to call
`collect`.</span>
<span class=k>print</span><span class=p>(</span><span
class=n>scaled_values</span><span class=o>.</span><span
class=n>collect</span><span
class=p>())</span></code></pre></div></div></div><p>In Beam the results from
all transforms result in a PCollection.
@@ -93,15 +95,19 @@ and access them as an <a
href=https://docs.python.org/3/glossary.html#term-itera
<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span
class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span
class=n>pipeline</span><span class=p>:</span>
<span class=n>values</span> <span class=o>=</span> <span
class=n>pipeline</span> <span class=o>|</span> <span class=n>beam</span><span
class=o>.</span><span class=n>Create</span><span class=p>([</span><span
class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span
class=p>,</span> <span class=mi>3</span><span class=p>,</span> <span
class=mi>4</span><span class=p>])</span>
- <span class=n>total</span> <span class=o>=</span> <span
class=n>values</span> <span class=o>|</span> <span class=n>beam</span><span
class=o>.</span><span class=n>CombineGlobally</span><span class=p>(</span><span
class=nb>sum</span><span class=p>)</span>
+ <span class=n>min_value</span> <span class=o>=</span> <span
class=n>values</span> <span class=o>|</span> <span class=n>beam</span><span
class=o>.</span><span class=n>CombineGlobally</span><span class=p>(</span><span
class=nb>min</span><span class=p>)</span>
+ <span class=n>max_value</span> <span class=o>=</span> <span
class=n>values</span> <span class=o>|</span> <span class=n>beam</span><span
class=o>.</span><span class=n>CombineGlobally</span><span class=p>(</span><span
class=nb>max</span><span class=p>)</span>
<span class=c1># To access `total`, we need to pass it as a side
input.</span>
<span class=n>scaled_values</span> <span class=o>=</span> <span
class=n>values</span> <span class=o>|</span> <span class=n>beam</span><span
class=o>.</span><span class=n>Map</span><span class=p>(</span>
- <span class=k>lambda</span> <span class=n>x</span><span
class=p>,</span> <span class=n>total</span><span class=p>:</span> <span
class=n>x</span> <span class=o>/</span> <span class=n>total</span><span
class=p>,</span>
- <span class=n>total</span><span class=o>=</span><span
class=n>beam</span><span class=o>.</span><span class=n>pvalue</span><span
class=o>.</span><span class=n>AsSingleton</span><span class=p>(</span><span
class=n>total</span><span class=p>))</span>
+ <span class=k>lambda</span> <span class=n>x</span><span
class=p>,</span> <span class=n>min_value</span><span class=p>,</span> <span
class=n>max_value</span><span class=p>:</span> <span class=n>x</span> <span
class=o>/</span> <span class=k>lambda</span> <span class=n>x</span><span
class=p>:</span> <span class=p>(</span><span class=n>x</span> <span
class=o>-</span> <span class=n>min_value</span><span class=p>)</span> <span
class=o>/</span> <span class=p>(</span><span class=n>max_va [...]
+ <span class=n>min_value</span> <span class=o>=</span><span
class=n>beam</span><span class=o>.</span><span class=n>pvalue</span><span
class=o>.</span><span class=n>AsSingleton</span><span class=p>(</span><span
class=n>min_value</span><span class=p>),</span>
+ <span class=n>max_value</span> <span class=o>=</span><span
class=n>beam</span><span class=o>.</span><span class=n>pvalue</span><span
class=o>.</span><span class=n>AsSingleton</span><span class=p>(</span><span
class=n>max_value</span><span class=p>))</span>
<span class=n>scaled_values</span> <span class=o>|</span> <span
class=n>beam</span><span class=o>.</span><span class=n>Map</span><span
class=p>(</span><span class=k>print</span><span
class=p>)</span></code></pre></div></div></div><blockquote><p>ℹ️ In Beam we
need to pass a side input explicitly, but we get the
-benefit that a reduction or aggregation does <em>not</em> have to fit into
memory.</p></blockquote><h2 id=next-steps>Next Steps</h2><ul><li>Take a look at
all the available transforms in the <a
href=/documentation/transforms/python/overview>Python transform
gallery</a>.</li><li>Learn how to read from and write to files in the <a
href=/documentation/programming-guide/#pipeline-io><em>Pipeline I/O</em>
section of the <em>Programming guide</em></a></li><li>Walk through additional
WordCount [...]
+benefit that a reduction or aggregation does <em>not</em> have to fit into
memory.
+Lazily computing side inputs also allows us to compute <code>values</code>
only once,
+rather than for each distinct reduction (or requiring explicit caching of the
RDD).</p></blockquote><h2 id=next-steps>Next Steps</h2><ul><li>Take a look at
all the available transforms in the <a
href=/documentation/transforms/python/overview>Python transform
gallery</a>.</li><li>Learn how to read from and write to files in the <a
href=/documentation/programming-guide/#pipeline-io><em>Pipeline I/O</em>
section of the <em>Programming guide</em></a></li><li>Walk through additional
WordCount [...]
<a href=http://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></div></div></div></footer></body></html>
\ No newline at end of file
diff --git a/website/generated-content/get-started/index.xml
b/website/generated-content/get-started/index.xml
index 0092b70f547..85b9db78a56 100644
--- a/website/generated-content/get-started/index.xml
+++ b/website/generated-content/get-started/index.xml
@@ -4300,8 +4300,8 @@ limitations under the License.
localStorage.setItem("language", "language-py")
</script>
<p>If you already know <a href="http://spark.apache.org/"><em>Apache
Spark</em></a>,
-learning <em>Apache Beam</em> is familiar.
-The Beam and Spark APIs are similar, so you already know the basic
concepts.</p>
+using Beam should be easy.
+The basic concepts are the same, and the APIs are similar as well.</p>
<p>Spark stores data <em>Spark DataFrames</em> for structured data,
and in <em>Resilient Distributed Datasets</em> (RDD) for unstructured
data.
We are using RDDs for this guide.</p>
@@ -4351,7 +4351,8 @@ methods like <code>data.map(...)</code>, but
they&rsquo;re doing the s
<blockquote>
<p>ℹ️ Note that we called <code>print</code> inside a
<code>Map</code> transform.
That&rsquo;s because we can only access the elements of a PCollection
-from within a PTransform.</p>
+from within a PTransform.
+To inspect the data locally, you can use the <a
href="https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development#creating_your_pipeline">InteractiveRunner</a></p>
</blockquote>
<p>Another thing to note is that Beam pipelines are constructed lazily.
This means that when you pipe <code>|</code> data you&rsquo;re only
declaring the
@@ -4549,9 +4550,10 @@ we can&rsquo;t guarantee that the results
we&rsquo;ve calculated are ava
<div class="highlight"><pre class="chroma"><code class="language-py"
data-lang="py"><span class="kn">import</span> <span
class="nn">pyspark</span>
<span class="n">sc</span> <span class="o">=</span> <span
class="n">pyspark</span><span class="o">.</span><span
class="n">SparkContext</span><span class="p">()</span>
<span class="n">values</span> <span class="o">=</span> <span
class="n">sc</span><span class="o">.</span><span
class="n">parallelize</span><span class="p">([</span><span
class="mi">1</span><span class="p">,</span> <span
class="mi">2</span><span class="p">,</span> <span
class="mi">3</span><span class="p">,</span> <span
class="mi">4</span><span class="p">])</span>
-<span class="n">total</span> <span class="o">=</span> <span
class="n">values</span><span class="o">.</span><span
class="n">reduce</span><span class="p">(</span><span
class="k">lambda</span> <span class="n">x</span><span
class="p">,</span> <span class="n">y</span><span
class="p">:</span> <span class="n">x</span> <span
class="o">+</span> <span class="n">y</span><span
class="p">)</span>
-<span class="c1"># We can simply use `total` since it&#39;s already a
Python `int` value from `reduce`.</span>
-<span class="n">scaled_values</span> <span class="o">=</span>
<span class="n">values</span><span class="o">.</span><span
class="n">map</span><span class="p">(</span><span
class="k">lambda</span> <span class="n">x</span><span
class="p">:</span> <span class="n">x</span> <span
class="o">/</span> <span class="n">total</span><span
class="p">)</span>
+<span class="n">min_value</span> <span class="o">=</span> <span
class="n">values</span><span class="o">.</span><span
class="n">reduce</span><span class="p">(</span><span
class="nb">min</span><span class="p">)</span>
+<span class="n">max_value</span> <span class="o">=</span> <span
class="n">values</span><span class="o">.</span><span
class="n">reduce</span><span class="p">(</span><span
class="nb">max</span><span class="p">)</span>
+<span class="c1"># We can simply use `min_value` and `max_value` since
it&#39;s already a Python `int` value from `reduce`.</span>
+<span class="n">scaled_values</span> <span class="o">=</span>
<span class="n">values</span><span class="o">.</span><span
class="n">map</span><span class="p">(</span><span
class="k">lambda</span> <span class="n">x</span><span
class="p">:</span> <span class="p">(</span><span
class="n">x</span> <span class="o">-</span> <span
class="n">min_value</span><span class="p">)</span> <span
class="o">/</span> &l [...]
<span class="c1"># But to access `scaled_values`, we need to call
`collect`.</span>
<span class="k">print</span><span class="p">(</span><span
class="n">scaled_values</span><span class="o">.</span><span
class="n">collect</span><span
class="p">())</span></code></pre></div>
</div>
@@ -4575,17 +4577,21 @@ and access them as an <a
href="https://docs.python.org/3/glossary.html#term-i
<div class="highlight"><pre class="chroma"><code class="language-py"
data-lang="py"><span class="kn">import</span> <span
class="nn">apache_beam</span> <span class="kn">as</span> <span
class="nn">beam</span>
<span class="k">with</span> <span class="n">beam</span><span
class="o">.</span><span class="n">Pipeline</span><span
class="p">()</span> <span class="k">as</span> <span
class="n">pipeline</span><span class="p">:</span>
<span class="n">values</span> <span class="o">=</span> <span
class="n">pipeline</span> <span class="o">|</span> <span
class="n">beam</span><span class="o">.</span><span
class="n">Create</span><span class="p">([</span><span
class="mi">1</span><span class="p">,</span> <span
class="mi">2</span><span class="p">,</span> <span
class="mi">3</span><span class="p">,</span> <span
class="mi">4</span><span c [...]
-<span class="n">total</span> <span class="o">=</span> <span
class="n">values</span> <span class="o">|</span> <span
class="n">beam</span><span class="o">.</span><span
class="n">CombineGlobally</span><span class="p">(</span><span
class="nb">sum</span><span class="p">)</span>
+<span class="n">min_value</span> <span class="o">=</span> <span
class="n">values</span> <span class="o">|</span> <span
class="n">beam</span><span class="o">.</span><span
class="n">CombineGlobally</span><span class="p">(</span><span
class="nb">min</span><span class="p">)</span>
+<span class="n">max_value</span> <span class="o">=</span> <span
class="n">values</span> <span class="o">|</span> <span
class="n">beam</span><span class="o">.</span><span
class="n">CombineGlobally</span><span class="p">(</span><span
class="nb">max</span><span class="p">)</span>
<span class="c1"># To access `total`, we need to pass it as a side
input.</span>
<span class="n">scaled_values</span> <span class="o">=</span>
<span class="n">values</span> <span class="o">|</span> <span
class="n">beam</span><span class="o">.</span><span
class="n">Map</span><span class="p">(</span>
-<span class="k">lambda</span> <span class="n">x</span><span
class="p">,</span> <span class="n">total</span><span
class="p">:</span> <span class="n">x</span> <span
class="o">/</span> <span class="n">total</span><span
class="p">,</span>
-<span class="n">total</span><span class="o">=</span><span
class="n">beam</span><span class="o">.</span><span
class="n">pvalue</span><span class="o">.</span><span
class="n">AsSingleton</span><span class="p">(</span><span
class="n">total</span><span class="p">))</span>
+<span class="k">lambda</span> <span class="n">x</span><span
class="p">,</span> <span class="n">min_value</span><span
class="p">,</span> <span class="n">max_value</span><span
class="p">:</span> <span class="n">x</span> <span
class="o">/</span> <span class="k">lambda</span> <span
class="n">x</span><span class="p">:</span> <span
class="p">(</span><span class="n">x</span> <span
class="o">-</span> <sp [...]
+<span class="n">min_value</span> <span class="o">=</span><span
class="n">beam</span><span class="o">.</span><span
class="n">pvalue</span><span class="o">.</span><span
class="n">AsSingleton</span><span class="p">(</span><span
class="n">min_value</span><span class="p">),</span>
+<span class="n">max_value</span> <span class="o">=</span><span
class="n">beam</span><span class="o">.</span><span
class="n">pvalue</span><span class="o">.</span><span
class="n">AsSingleton</span><span class="p">(</span><span
class="n">max_value</span><span class="p">))</span>
<span class="n">scaled_values</span> <span class="o">|</span>
<span class="n">beam</span><span class="o">.</span><span
class="n">Map</span><span class="p">(</span><span
class="k">print</span><span
class="p">)</span></code></pre></div>
</div>
</div>
<blockquote>
<p>ℹ️ In Beam we need to pass a side input explicitly, but we get the
-benefit that a reduction or aggregation does <em>not</em> have to fit
into memory.</p>
+benefit that a reduction or aggregation does <em>not</em> have to fit
into memory.
+Lazily computing side inputs also allows us to compute
<code>values</code> only once,
+rather than for each distinct reduction (or requiring explicit caching of the
RDD).</p>
</blockquote>
<h2 id="next-steps">Next Steps</h2>
<ul>
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index a01d82b2f92..567e77a0641 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.37.0/</loc><lastmod>2022-03-04T10:14:02-08:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2022-03-28T08:41:34-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2022-03-28T08:41:34-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2022-03-28T08:41:34-07:00</lastmod></url><url><loc>/blog/u
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.37.0/</loc><lastmod>2022-03-04T10:14:02-08:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2022-03-28T08:41:34-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2022-03-28T08:41:34-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2022-03-28T08:41:34-07:00</lastmod></url><url><loc>/blog/u
[...]
\ No newline at end of file