16 00:01:48 at commit 024d96c

git-site-role Fri, 15 Oct 2021 17:02:32 -0700

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 6c94813  Publishing website 2021/10/16 00:01:48 at commit 024d96c
6c94813 is described below

commit 6c948132248d45ca8f7aff559c0b0eb82fd2cb43
Author: jenkins <[email protected]>
AuthorDate: Sat Oct 16 00:01:49 2021 +0000

    Publishing website 2021/10/16 00:01:48 at commit 024d96c
---
 .../documentation/dsls/dataframes/overview/index.html   | 17 +++++++++--------
 .../documentation/dsls/sql/walkthrough/index.html       |  5 +++--
 website/generated-content/sitemap.xml                   |  2 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git 
a/website/generated-content/documentation/dsls/dataframes/overview/index.html 
b/website/generated-content/documentation/dsls/dataframes/overview/index.html
index a67b527..a1398bc 100644
--- 
a/website/generated-content/documentation/dsls/dataframes/overview/index.html
+++ 
b/website/generated-content/documentation/dsls/dataframes/overview/index.html
@@ -22,12 +22,16 @@ function 
openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfi
 Run in Colab</a></td></table><p><br><br><br><br></p><p>The Apache Beam Python 
SDK provides a DataFrame API for working with pandas-like <a 
href=https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>DataFrame</a>
 objects. The feature lets you convert a PCollection to a DataFrame and then 
interact with the DataFrame using the standard methods available on the pandas 
DataFrame API. The DataFrame API is built on top of the pandas implementation, 
and pandas DataFram [...]
 </code></pre><p>Note that the <em>same</em> <code>pandas</code> version should 
be installed on workers when executing DataFrame API pipelines on distributed 
runners. Reference <a 
href=https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt><code>base_image_requirements.txt</code></a>
 for the Beam release you are using to see what version of <code>pandas</code> 
will be used by default on workers.</p><h2 id=using-dataframes>Using 
DataFrames</h2><p>You c [...]
 
-<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span 
class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span 
class=n>p</span><span class=p>:</span>
-  <span class=n>df</span> <span class=o>=</span> <span class=n>p</span> <span 
class=o>|</span> <span class=n>read_csv</span><span class=p>(</span><span 
class=s2>&#34;gs://apache-beam-samples/nyc_taxi/misc/sample.csv&#34;</span><span
 class=p>)</span>
-  <span class=n>agg</span> <span class=o>=</span> <span class=n>df</span><span 
class=p>[[</span><span class=s1>&#39;passenger_count&#39;</span><span 
class=p>,</span> <span class=s1>&#39;DOLocationID&#39;</span><span 
class=p>]]</span><span class=o>.</span><span class=n>groupby</span><span 
class=p>(</span><span class=s1>&#39;DOLocationID&#39;</span><span 
class=p>)</span><span class=o>.</span><span class=n>sum</span><span 
class=p>()</span>
-  <span class=n>agg</span><span class=o>.</span><span 
class=n>to_csv</span><span class=p>(</span><span 
class=s1>&#39;output&#39;</span><span 
class=p>)</span></code></pre></div></div></div><p>pandas is able to infer 
column names from the first row of the CSV data, which is where 
<code>passenger_count</code> and <code>DOLocationID</code> come from.</p><p>In 
this example, the only traditional Beam type is the <code>Pipeline</code> 
instance. Otherwise the example is written completely with t [...]
+<span class=k>with</span> <span class=n>pipeline</span> <span 
class=k>as</span> <span class=n>p</span><span class=p>:</span>
+  <span class=n>rides</span> <span class=o>=</span> <span class=n>p</span> 
<span class=o>|</span> <span class=n>read_csv</span><span class=p>(</span><span 
class=n>input_path</span><span class=p>)</span>
+
+  <span class=c1># Count the number of passengers dropped off per 
LocationID</span>
+  <span class=n>agg</span> <span class=o>=</span> <span 
class=n>rides</span><span class=o>.</span><span class=n>groupby</span><span 
class=p>(</span><span class=s1>&#39;DOLocationID&#39;</span><span 
class=p>)</span><span class=o>.</span><span class=n>passenger_count</span><span 
class=o>.</span><span class=n>sum</span><span class=p>()</span>
+  <span class=n>agg</span><span class=o>.</span><span 
class=n>to_csv</span><span class=p>(</span><span 
class=n>output_path</span><span 
class=p>)</span></code></pre></div></div></div><p>pandas is able to infer 
column names from the first row of the CSV data, which is where 
<code>passenger_count</code> and <code>DOLocationID</code> come from.</p><p>In 
this example, the only traditional Beam type is the <code>Pipeline</code> 
instance. Otherwise the example is written completely with the Dat [...]
 <span class=kn>from</span> <span class=nn>apache_beam.dataframe.convert</span> 
<span class=kn>import</span> <span class=n>to_pcollection</span>
 <span class=o>...</span>
+
+
     <span class=c1># Read the text file[pattern] into a PCollection.</span>
     <span class=n>lines</span> <span class=o>=</span> <span class=n>p</span> 
<span class=o>|</span> <span class=s1>&#39;Read&#39;</span> <span 
class=o>&gt;&gt;</span> <span class=n>ReadFromText</span><span 
class=p>(</span><span class=n>known_args</span><span class=o>.</span><span 
class=n>input</span><span class=p>)</span>
 
@@ -45,10 +49,7 @@ Run in Colab</a></td></table><p><br><br><br><br></p><p>The 
Apache Beam Python SD
     <span class=n>counted</span><span class=o>.</span><span 
class=n>to_csv</span><span class=p>(</span><span class=n>known_args</span><span 
class=o>.</span><span class=n>output</span><span class=p>)</span>
 
     <span class=c1># Deferred DataFrames can also be converted back to 
schema&#39;d PCollections</span>
-    <span class=n>counted_pc</span> <span class=o>=</span> <span 
class=n>to_pcollection</span><span class=p>(</span><span 
class=n>counted</span><span class=p>,</span> <span 
class=n>include_indexes</span><span class=o>=</span><span 
class=bp>True</span><span class=p>)</span>
-
-    <span class=c1># Do something with counted_pc</span>
-    <span class=o>...</span></code></pre></div></div></div><p>You can find the 
full wordcount example on
+    <span class=n>counted_pc</span> <span class=o>=</span> <span 
class=n>to_pcollection</span><span class=p>(</span><span 
class=n>counted</span><span class=p>,</span> <span 
class=n>include_indexes</span><span class=o>=</span><span 
class=bp>True</span><span class=p>)</span></code></pre></div></div></div><p>You 
can find the full wordcount example on
 <a 
href=https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/wordcount.py>GitHub</a>,
 along with other <a 
href=https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/>example
 DataFrame pipelines</a>.</p><p>It’s also possible to use the DataFrame API by 
passing a function to <a 
href=https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform><code>DataframeTransform</code></a>:</p><div
 class="language-py snippet"><div class="notebook-skip code-snippet"><a 
class=copy  [...]
 
diff --git 
a/website/generated-content/documentation/dsls/sql/walkthrough/index.html 
b/website/generated-content/documentation/dsls/sql/walkthrough/index.html
index 4287556..a0e6f89 100644
--- a/website/generated-content/documentation/dsls/sql/walkthrough/index.html
+++ b/website/generated-content/documentation/dsls/sql/walkthrough/index.html
@@ -113,8 +113,9 @@ to either a single <code>PCollection</code> or a 
<code>PCollectionTuple</code> w
 </span><span class=c1></span>    <span class=c1>// by joining two PCollections
 </span><span class=c1></span>    <span class=n>PCollection</span><span 
class=o>&lt;</span><span class=n>Row</span><span class=o>&gt;</span> <span 
class=n>output</span> <span class=o>=</span> <span 
class=n>namesAndFoods</span><span class=o>.</span><span 
class=na>apply</span><span class=o>(</span>
         <span class=n>SqlTransform</span><span class=o>.</span><span 
class=na>query</span><span class=o>(</span>
-            <span class=s>&#34;SELECT Names.appId, COUNT(Reviews.rating), 
AVG(Reviews.rating)&#34;</span>
-                <span class=o>+</span> <span class=s>&#34;FROM Apps INNER JOIN 
Reviews ON Apps.appId == Reviews.appId&#34;</span><span class=o>));</span>
+            <span class=s>&#34;SELECT Apps.appId, COUNT(Reviews.rating), 
AVG(Reviews.rating) &#34;</span>
+                <span class=o>+</span> <span class=s>&#34;FROM Apps INNER JOIN 
Reviews ON Apps.appId = Reviews.appId &#34;</span>
+                <span class=o>+</span> <span class=s>&#34;GROUP BY 
Apps.appId&#34;</span><span class=o>));</span>
     </code></pre></div></div></div></p></li></ul><p><a 
href=https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/example/BeamSqlExample.java>BeamSqlExample</a>
 in the code repository shows basic usage of both APIs.</p></div></div><footer 
class=footer><div class=footer__contained><div class=footer__cols><div 
class="footer__cols__col footer__cols__col__logos"><div 
class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg 
class=footer__logo alt="Beam logo"></div><div 
class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg 
class=footer__logo alt="Apache logo"></div></div><div class=footer-wrapper><div 
class=wrapper-grid><div [...]
 <a href=http://www.apache.org>The Apache Software Foundation</a>
diff --git a/website/generated-content/sitemap.xml 
b/website/generated-content/sitemap.xml
index 0bfeeb6..a962144 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b
 [...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b
 [...]
\ No newline at end of file

[beam] branch asf-site updated: Publishing website 2021/10/16 00:01:48 at commit 024d96c

Reply via email to