Regenerate website
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/aa9b7fea Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/aa9b7fea Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/aa9b7fea Branch: refs/heads/asf-site Commit: aa9b7fea05f5e1177ff40f7b3977cdfb9ec0dd19 Parents: 8bc6392 Author: Ahmet Altay <al...@google.com> Authored: Mon Feb 13 12:11:35 2017 -0800 Committer: Ahmet Altay <al...@google.com> Committed: Mon Feb 13 12:11:35 2017 -0800 ---------------------------------------------------------------------- content/documentation/programming-guide/index.html | 9 ++++----- content/get-started/wordcount-example/index.html | 4 ++-- 2 files changed, 6 insertions(+), 7 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/aa9b7fea/content/documentation/programming-guide/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index f02fd40..0aa0575 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -515,7 +515,7 @@ <p class="language-java">Inside your <code class="highlighter-rouge">DoFn</code> subclass, youâll write a method annotated with <code class="highlighter-rouge">@ProcessElement</code> where you provide the actual processing logic. You donât need to manually extract the elements from the input collection; the Beam SDKs handle that for you. Your <code class="highlighter-rouge">@ProcessElement</code> method should accept an object of type <code class="highlighter-rouge">ProcessContext</code>. The <code class="highlighter-rouge">ProcessContext</code> object gives you access to an input element and a method for emitting an output element:</p> -<p class="language-py">Inside your <code class="highlighter-rouge">DoFn</code> subclass, youâll write a method <code class="highlighter-rouge">process</code> where you provide the actual processing logic. You donât need to manually extract the elements from the input collection; the Beam SDKs handle that for you. Your <code class="highlighter-rouge">process</code> method should accept an object of type <code class="highlighter-rouge">context</code>. The <code class="highlighter-rouge">context</code> object gives you access to an input element and output is emitted by using <code class="highlighter-rouge">yield</code> or <code class="highlighter-rouge">return</code> statement inside <code class="highlighter-rouge">process</code> method.</p> +<p class="language-py">Inside your <code class="highlighter-rouge">DoFn</code> subclass, youâll write a method <code class="highlighter-rouge">process</code> where you provide the actual processing logic. You donât need to manually extract the elements from the input collection; the Beam SDKs handle that for you. Your <code class="highlighter-rouge">process</code> method should accept an object of type <code class="highlighter-rouge">element</code>. This is the input element and output is emitted by using <code class="highlighter-rouge">yield</code> or <code class="highlighter-rouge">return</code> statement inside <code class="highlighter-rouge">process</code> method.</p> <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">ComputeWordLengthFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">></span> <span class="o">{</span> <span class="nd">@ProcessElement</span> @@ -610,11 +610,11 @@ <h4 id="a-nametransforms-gbkausing-groupbykey"><a name="transforms-gbk"></a>Using GroupByKey</h4> -<p><code class="highlighter-rouge">GroupByKey</code> is a Beam transform for processing collections of key/value pairs. Itâs a parallel reduction operation, analagous to the Shuffle phase of a Map/Shuffle/Reduce-style algorithm. The input to <code class="highlighter-rouge">GroupByKey</code> is a collection of key/value pairs that represents a <em>multimap</em>, where the collection contains multiple pairs that have the same key, but different values. Given such a collection, you use <code class="highlighter-rouge">GroupByKey</code> to collect all of the values associated with each unique key.</p> +<p><code class="highlighter-rouge">GroupByKey</code> is a Beam transform for processing collections of key/value pairs. Itâs a parallel reduction operation, analogous to the Shuffle phase of a Map/Shuffle/Reduce-style algorithm. The input to <code class="highlighter-rouge">GroupByKey</code> is a collection of key/value pairs that represents a <em>multimap</em>, where the collection contains multiple pairs that have the same key, but different values. Given such a collection, you use <code class="highlighter-rouge">GroupByKey</code> to collect all of the values associated with each unique key.</p> <p><code class="highlighter-rouge">GroupByKey</code> is a good way to aggregate data that has something in common. For example, if you have a collection that stores records of customer orders, you might want to group together all the orders from the same postal code (wherein the âkeyâ of the key/value pair is the postal code field, and the âvalueâ is the remainder of the record).</p> -<p>Letâs examine the mechanics of <code class="highlighter-rouge">GroupByKey</code> with a simple xample case, where our data set consists of words from a text file and the line number on which they appear. We want to group together all the line numbers (values) that share the same word (key), letting us see all the places in the text where a particular word appears.</p> +<p>Letâs examine the mechanics of <code class="highlighter-rouge">GroupByKey</code> with a simple example case, where our data set consists of words from a text file and the line number on which they appear. We want to group together all the line numbers (values) that share the same word (key), letting us see all the places in the text where a particular word appears.</p> <p>Our input is a <code class="highlighter-rouge">PCollection</code> of key/value pairs where each word is a key, and the value is a line number in the file where the word appears. Hereâs a list of the key/value pairs in the input collection:</p> @@ -1046,7 +1046,7 @@ tree, [2] <span class="c"># We can also pass side inputs to a ParDo transform, which will get passed to its process method.</span> -<span class="c"># The only change is that the first arguments are self and a context, rather than the PCollection element itself.</span> +<span class="c"># The first two arguments for the process method would be self and element.</span> <span class="k">class</span> <span class="nc">FilterUsingLength</span><span class="p">(</span><span class="n">beam</span><span class="o">.</span><span class="n">DoFn</span><span class="p">):</span> <span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">element</span><span class="p">,</span> <span class="n">lower_bound</span><span class="p">,</span> <span class="n">upper_bound</span><span class="o">=</span><span class="nb">float</span><span class="p">(</span><span class="s">'inf'</span><span class="p">)):</span> @@ -1056,7 +1056,6 @@ tree, [2] <span class="n">small_words</span> <span class="o">=</span> <span class="n">words</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">ParDo</span><span class="p">(</span><span class="n">FilterUsingLength</span><span class="p">(),</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="o">...</span> - </code></pre> </div> http://git-wip-us.apache.org/repos/asf/beam-site/blob/aa9b7fea/content/get-started/wordcount-example/index.html ---------------------------------------------------------------------- diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html index 7fc2f71..0295c31 100644 --- a/content/get-started/wordcount-example/index.html +++ b/content/get-started/wordcount-example/index.html @@ -456,8 +456,8 @@ Figure 1: The pipeline data flow.</p> <span class="k">def</span> <span class="nf">expand</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pcoll</span><span class="p">):</span> <span class="k">return</span> <span class="p">(</span><span class="n">pcoll</span> <span class="c"># Convert lines of text into individual words.</span> - <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">FlatMap</span><span class="p">(</span> - <span class="s">'ExtractWords'</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'[A-Za-z</span><span class="se">\'</span><span class="s">]+'</span><span class="p">,</span> <span class="n">x</span><span class="p">))</span> + <span class="o">|</span> <span class="s">'ExtractWords'</span> <span class="o">>></span> <span class="n">beam</span><span class="o">.</span><span class="n">FlatMap</span><span class="p">(</span> + <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s">r'[A-Za-z</span><span class="se">\'</span><span class="s">]+'</span><span class="p">,</span> <span class="n">x</span><span class="p">))</span> <span class="c"># Count the number of times each word occurs.</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">combiners</span><span class="o">.</span><span class="n">Count</span><span class="o">.</span><span class="n">PerElement</span><span class="p">())</span>