This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 6c94813 Publishing website 2021/10/16 00:01:48 at commit 024d96c
6c94813 is described below
commit 6c948132248d45ca8f7aff559c0b0eb82fd2cb43
Author: jenkins <[email protected]>
AuthorDate: Sat Oct 16 00:01:49 2021 +0000
Publishing website 2021/10/16 00:01:48 at commit 024d96c
---
.../documentation/dsls/dataframes/overview/index.html | 17 +++++++++--------
.../documentation/dsls/sql/walkthrough/index.html | 5 +++--
website/generated-content/sitemap.xml | 2 +-
3 files changed, 13 insertions(+), 11 deletions(-)
diff --git
a/website/generated-content/documentation/dsls/dataframes/overview/index.html
b/website/generated-content/documentation/dsls/dataframes/overview/index.html
index a67b527..a1398bc 100644
---
a/website/generated-content/documentation/dsls/dataframes/overview/index.html
+++
b/website/generated-content/documentation/dsls/dataframes/overview/index.html
@@ -22,12 +22,16 @@ function
openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfi
Run in Colab</a></td></table><p><br><br><br><br></p><p>The Apache Beam Python
SDK provides a DataFrame API for working with pandas-like <a
href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>DataFrame</a>
objects. The feature lets you convert a PCollection to a DataFrame and then
interact with the DataFrame using the standard methods available on the pandas
DataFrame API. The DataFrame API is built on top of the pandas implementation,
and pandas DataFram [...]
</code></pre><p>Note that the <em>same</em> <code>pandas</code> version should
be installed on workers when executing DataFrame API pipelines on distributed
runners. Reference <a
href=https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt><code>base_image_requirements.txt</code></a>
for the Beam release you are using to see what version of <code>pandas</code>
will be used by default on workers.</p><h2 id=using-dataframes>Using
DataFrames</h2><p>You c [...]
-<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span
class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span
class=n>p</span><span class=p>:</span>
- <span class=n>df</span> <span class=o>=</span> <span class=n>p</span> <span
class=o>|</span> <span class=n>read_csv</span><span class=p>(</span><span
class=s2>"gs://apache-beam-samples/nyc_taxi/misc/sample.csv"</span><span
class=p>)</span>
- <span class=n>agg</span> <span class=o>=</span> <span class=n>df</span><span
class=p>[[</span><span class=s1>'passenger_count'</span><span
class=p>,</span> <span class=s1>'DOLocationID'</span><span
class=p>]]</span><span class=o>.</span><span class=n>groupby</span><span
class=p>(</span><span class=s1>'DOLocationID'</span><span
class=p>)</span><span class=o>.</span><span class=n>sum</span><span
class=p>()</span>
- <span class=n>agg</span><span class=o>.</span><span
class=n>to_csv</span><span class=p>(</span><span
class=s1>'output'</span><span
class=p>)</span></code></pre></div></div></div><p>pandas is able to infer
column names from the first row of the CSV data, which is where
<code>passenger_count</code> and <code>DOLocationID</code> come from.</p><p>In
this example, the only traditional Beam type is the <code>Pipeline</code>
instance. Otherwise the example is written completely with t [...]
+<span class=k>with</span> <span class=n>pipeline</span> <span
class=k>as</span> <span class=n>p</span><span class=p>:</span>
+ <span class=n>rides</span> <span class=o>=</span> <span class=n>p</span>
<span class=o>|</span> <span class=n>read_csv</span><span class=p>(</span><span
class=n>input_path</span><span class=p>)</span>
+
+ <span class=c1># Count the number of passengers dropped off per
LocationID</span>
+ <span class=n>agg</span> <span class=o>=</span> <span
class=n>rides</span><span class=o>.</span><span class=n>groupby</span><span
class=p>(</span><span class=s1>'DOLocationID'</span><span
class=p>)</span><span class=o>.</span><span class=n>passenger_count</span><span
class=o>.</span><span class=n>sum</span><span class=p>()</span>
+ <span class=n>agg</span><span class=o>.</span><span
class=n>to_csv</span><span class=p>(</span><span
class=n>output_path</span><span
class=p>)</span></code></pre></div></div></div><p>pandas is able to infer
column names from the first row of the CSV data, which is where
<code>passenger_count</code> and <code>DOLocationID</code> come from.</p><p>In
this example, the only traditional Beam type is the <code>Pipeline</code>
instance. Otherwise the example is written completely with the Dat [...]
<span class=kn>from</span> <span class=nn>apache_beam.dataframe.convert</span>
<span class=kn>import</span> <span class=n>to_pcollection</span>
<span class=o>...</span>
+
+
<span class=c1># Read the text file[pattern] into a PCollection.</span>
<span class=n>lines</span> <span class=o>=</span> <span class=n>p</span>
<span class=o>|</span> <span class=s1>'Read'</span> <span
class=o>>></span> <span class=n>ReadFromText</span><span
class=p>(</span><span class=n>known_args</span><span class=o>.</span><span
class=n>input</span><span class=p>)</span>
@@ -45,10 +49,7 @@ Run in Colab</a></td></table><p><br><br><br><br></p><p>The
Apache Beam Python SD
<span class=n>counted</span><span class=o>.</span><span
class=n>to_csv</span><span class=p>(</span><span class=n>known_args</span><span
class=o>.</span><span class=n>output</span><span class=p>)</span>
<span class=c1># Deferred DataFrames can also be converted back to
schema'd PCollections</span>
- <span class=n>counted_pc</span> <span class=o>=</span> <span
class=n>to_pcollection</span><span class=p>(</span><span
class=n>counted</span><span class=p>,</span> <span
class=n>include_indexes</span><span class=o>=</span><span
class=bp>True</span><span class=p>)</span>
-
- <span class=c1># Do something with counted_pc</span>
- <span class=o>...</span></code></pre></div></div></div><p>You can find the
full wordcount example on
+ <span class=n>counted_pc</span> <span class=o>=</span> <span
class=n>to_pcollection</span><span class=p>(</span><span
class=n>counted</span><span class=p>,</span> <span
class=n>include_indexes</span><span class=o>=</span><span
class=bp>True</span><span class=p>)</span></code></pre></div></div></div><p>You
can find the full wordcount example on
<a
href=https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/wordcount.py>GitHub</a>,
along with other <a
href=https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/>example
DataFrame pipelines</a>.</p><p>It’s also possible to use the DataFrame API by
passing a function to <a
href=https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform><code>DataframeTransform</code></a>:</p><div
class="language-py snippet"><div class="notebook-skip code-snippet"><a
class=copy [...]
diff --git
a/website/generated-content/documentation/dsls/sql/walkthrough/index.html
b/website/generated-content/documentation/dsls/sql/walkthrough/index.html
index 4287556..a0e6f89 100644
--- a/website/generated-content/documentation/dsls/sql/walkthrough/index.html
+++ b/website/generated-content/documentation/dsls/sql/walkthrough/index.html
@@ -113,8 +113,9 @@ to either a single <code>PCollection</code> or a
<code>PCollectionTuple</code> w
</span><span class=c1></span> <span class=c1>// by joining two PCollections
</span><span class=c1></span> <span class=n>PCollection</span><span
class=o><</span><span class=n>Row</span><span class=o>></span> <span
class=n>output</span> <span class=o>=</span> <span
class=n>namesAndFoods</span><span class=o>.</span><span
class=na>apply</span><span class=o>(</span>
<span class=n>SqlTransform</span><span class=o>.</span><span
class=na>query</span><span class=o>(</span>
- <span class=s>"SELECT Names.appId, COUNT(Reviews.rating),
AVG(Reviews.rating)"</span>
- <span class=o>+</span> <span class=s>"FROM Apps INNER JOIN
Reviews ON Apps.appId == Reviews.appId"</span><span class=o>));</span>
+ <span class=s>"SELECT Apps.appId, COUNT(Reviews.rating),
AVG(Reviews.rating) "</span>
+ <span class=o>+</span> <span class=s>"FROM Apps INNER JOIN
Reviews ON Apps.appId = Reviews.appId "</span>
+ <span class=o>+</span> <span class=s>"GROUP BY
Apps.appId"</span><span class=o>));</span>
</code></pre></div></div></div></p></li></ul><p><a
href=https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java>BeamSqlExample</a>
in the code repository shows basic usage of both APIs.</p></div></div><footer
class=footer><div class=footer__contained><div class=footer__cols><div
class="footer__cols__col footer__cols__col__logos"><div
class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg
class=footer__logo alt="Beam logo"></div><div
class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg
class=footer__logo alt="Apache logo"></div></div><div class=footer-wrapper><div
class=wrapper-grid><div [...]
<a href=http://www.apache.org>The Apache Software Foundation</a>
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index 0bfeeb6..a962144 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b
[...]
\ No newline at end of file