Rebuild website after merge
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/ef362b54 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/ef362b54 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/ef362b54 Branch: refs/heads/asf-site Commit: ef362b546ff4693f08285321c1eb3833abd1434c Parents: 2817c6d Author: Dan Halperin <[email protected]> Authored: Mon Apr 17 10:29:52 2017 -0700 Committer: Dan Halperin <[email protected]> Committed: Mon Apr 17 10:29:52 2017 -0700 ---------------------------------------------------------------------- .../ptransform-style-guide/index.html | 2 +- .../pipelines/design-your-pipeline/index.html | 53 ++++++++------- .../pipelines/test-your-pipeline/index.html | 24 +++++-- .../documentation/programming-guide/index.html | 67 +++++++++---------- .../runners/capability-matrix/index.html | 2 +- .../design-your-pipeline-side-outputs.png | Bin 36451 -> 0 bytes 6 files changed, 80 insertions(+), 68 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/ef362b54/content/contribute/ptransform-style-guide/index.html ---------------------------------------------------------------------- diff --git a/content/contribute/ptransform-style-guide/index.html b/content/contribute/ptransform-style-guide/index.html index 66921a3..b6a3619 100644 --- a/content/contribute/ptransform-style-guide/index.html +++ b/content/contribute/ptransform-style-guide/index.html @@ -321,7 +321,7 @@ One advantage of putting a parameter into transform configuration is, it can be <li>If the transform can have unprocessable (permanently failing) records and you want the pipeline to proceed despite that: <ul> <li>If bad records are safe to ignore, count the bad records in a metric. Make sure the transformâs documentation mentions this aggregator. Beware that there is no programmatic access to reading the aggregator value from inside the pipeline during execution.</li> - <li>If bad records may need manual inspection by the user, emit them into a side output.</li> + <li>If bad records may need manual inspection by the user, emit them into an output that contains only those records.</li> <li>Alternatively take a (default zero) threshold above which element failures become bundle failures (structure the transform to count the total number of elements and of failed elements, compare them and fail if failures are above the threshold).</li> </ul> </li> http://git-wip-us.apache.org/repos/asf/beam-site/blob/ef362b54/content/documentation/pipelines/design-your-pipeline/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/pipelines/design-your-pipeline/index.html b/content/documentation/pipelines/design-your-pipeline/index.html index 8e0e51f..2185418 100644 --- a/content/documentation/pipelines/design-your-pipeline/index.html +++ b/content/documentation/pipelines/design-your-pipeline/index.html @@ -158,7 +158,7 @@ <li><a href="#a-basic-pipeline" id="markdown-toc-a-basic-pipeline">A basic pipeline</a></li> <li><a href="#branching-pcollections" id="markdown-toc-branching-pcollections">Branching PCollections</a> <ul> <li><a href="#multiple-transforms-process-the-same-pcollection" id="markdown-toc-multiple-transforms-process-the-same-pcollection">Multiple transforms process the same PCollection</a></li> - <li><a href="#a-single-transform-that-uses-side-outputs" id="markdown-toc-a-single-transform-that-uses-side-outputs">A single transform that uses side outputs</a></li> + <li><a href="#a-single-transform-that-produces-multiple-outputs" id="markdown-toc-a-single-transform-that-produces-multiple-outputs">A single transform that produces multiple outputs</a></li> </ul> </li> <li><a href="#merging-pcollections" id="markdown-toc-merging-pcollections">Merging PCollections</a></li> @@ -228,18 +228,18 @@ </code></pre> </div> -<h3 id="a-single-transform-that-uses-side-outputs">A single transform that uses side outputs</h3> +<h3 id="a-single-transform-that-produces-multiple-outputs">A single transform that produces multiple outputs</h3> -<p>Another way to branch a pipeline is to have a <strong>single</strong> transform output to multiple <code class="highlighter-rouge">PCollection</code>s by using <a href="/documentation/programming-guide/#transforms-sideio">side outputs</a>. Transforms that use side outputs, process each element of the input once, and allow you to output to zero or more <code class="highlighter-rouge">PCollection</code>s.</p> +<p>Another way to branch a pipeline is to have a <strong>single</strong> transform output to multiple <code class="highlighter-rouge">PCollection</code>s by using <a href="/documentation/programming-guide/#transforms-outputs">tagged outputs</a>. Transforms that produce more than one output process each element of the input once, and output to zero or more <code class="highlighter-rouge">PCollection</code>s.</p> -<p>Figure 3 below illustrates the same example described above, but with one transform that uses a side output; Names that start with âAâ are added to the output <code class="highlighter-rouge">PCollection</code>, and names that start with âBâ are added to the side output <code class="highlighter-rouge">PCollection</code>.</p> +<p>Figure 3 below illustrates the same example described above, but with one transform that produces multiple outputs. Names that start with âAâ are added to the main output <code class="highlighter-rouge">PCollection</code>, and names that start with âBâ are added to an additional output <code class="highlighter-rouge">PCollection</code>.</p> <figure id="fig3"> - <img src="/images/design-your-pipeline-side-outputs.png" alt="A pipeline with a transform that outputs multiple PCollections." /> + <img src="/images/design-your-pipeline-additional-outputs.png" alt="A pipeline with a transform that outputs multiple PCollections." /> </figure> <p>Figure 3: A pipeline with a transform that outputs multiple PCollections.</p> -<p>The pipeline in Figure 2 contains two transforms that process the elements in the same input <code class="highlighter-rouge">PCollection</code>. One transform uses the following logic pattern:</p> +<p>The pipeline in Figure 2 contains two transforms that process the elements in the same input <code class="highlighter-rouge">PCollection</code>. One transform uses the following logic:</p> <pre>if (starts with 'A') { outputToPCollectionA }</pre> @@ -249,43 +249,46 @@ <p>Because each transform reads the entire input <code class="highlighter-rouge">PCollection</code>, each element in the input <code class="highlighter-rouge">PCollection</code> is processed twice.</p> -<p>The pipeline in Figure 3 performs the same operation in a different way - with only one transform that uses the logic</p> +<p>The pipeline in Figure 3 performs the same operation in a different way - with only one transform that uses the following logic:</p> <pre>if (starts with 'A') { outputToPCollectionA } else if (starts with 'B') { outputToPCollectionB }</pre> <p>where each element in the input <code class="highlighter-rouge">PCollection</code> is processed once. See the example code below:</p> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">//define main stream and side output</span> -<span class="kd">final</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">mainStreamTag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">>(){};</span> -<span class="kd">final</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">sideoutTag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">>(){};</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Define two TupleTags, one for each output.</span> +<span class="kd">final</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">startsWithATag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">>(){};</span> +<span class="kd">final</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">startsWithBTag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">>(){};</span> <span class="n">PCollectionTuple</span> <span class="n">mixedCollection</span> <span class="o">=</span> <span class="n">dbRowCollection</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span> <span class="n">ParDo</span> - <span class="c1">// Specify the tag for the main output, wordsBelowCutoffTag.</span> - <span class="o">.</span><span class="na">withOutputTags</span><span class="o">(</span><span class="n">mainStreamTag</span><span class="o">,</span> - <span class="c1">// Specify the tags for the two side outputs as a TupleTagList.</span> - <span class="n">TupleTagList</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">sideoutTag</span><span class="o">))</span> + <span class="c1">// Specify main output. In this example, it is the output</span> + <span class="c1">// with tag startsWithATag.</span> + <span class="o">.</span><span class="na">withOutputTags</span><span class="o">(</span><span class="n">startsWithATag</span><span class="o">,</span> + <span class="c1">// Specify the output with tag startsWithBTag, as a TupleTagList.</span> + <span class="n">TupleTagList</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">startsWithBTag</span><span class="o">))</span> <span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">>()</span> <span class="o">{</span> <span class="nd">@ProcessElement</span> - <span class="kd">public</span> <span class="kt">void</span> <span class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> - <span class="k">if</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">().</span><span class="na">startsWith</span><span class="o">(</span><span class="s">"A"</span><span class="o">)){</span><span class="c1">//output to main stream</span> - <span class="n">c</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">());</span> - <span class="o">}</span><span class="k">else</span> <span class="k">if</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">().</span><span class="na">startsWith</span><span class="o">(</span><span class="s">"B"</span><span class="o">)){</span><span class="c1">//emit as Side outputs</span> - <span class="n">c</span><span class="o">.</span><span class="na">sideOutput</span><span class="o">(</span><span class="n">sideoutTag</span><span class="o">,</span> <span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">());</span> + <span class="kd">public</span> <span class="kt">void</span> <span class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> + <span class="k">if</span> <span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">().</span><span class="na">startsWith</span><span class="o">(</span><span class="s">"A"</span><span class="o">))</span> <span class="o">{</span> + <span class="c1">// Emit to main output, which is the output with tag startsWithATag.</span> + <span class="n">c</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">());</span> + <span class="o">}</span> <span class="k">else</span> <span class="k">if</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">().</span><span class="na">startsWith</span><span class="o">(</span><span class="s">"B"</span><span class="o">))</span> <span class="o">{</span> + <span class="c1">// Emit to output with tag startsWithBTag.</span> + <span class="n">c</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">startsWithBTag</span><span class="o">,</span> <span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">());</span> + <span class="o">}</span> <span class="o">}</span> <span class="o">}</span> - <span class="o">}</span> <span class="o">));</span> -<span class="c1">// get subset of main stream </span> -<span class="n">mixedCollection</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">mainStreamTag</span><span class="o">).</span><span class="na">apply</span><span class="o">(...);</span> +<span class="c1">// Get subset of the output with tag startsWithATag.</span> +<span class="n">mixedCollection</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">startsWithATag</span><span class="o">).</span><span class="na">apply</span><span class="o">(...);</span> -<span class="c1">// get subset of Side output</span> -<span class="n">mixedCollection</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">sideoutTag</span><span class="o">).</span><span class="na">apply</span><span class="o">(...);</span> +<span class="c1">// Get subset of the output with tag startsWithBTag.</span> +<span class="n">mixedCollection</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="n">startsWithBTag</span><span class="o">).</span><span class="na">apply</span><span class="o">(...);</span> </code></pre> </div> -<p>You can use either mechanism to produce multiple output <code class="highlighter-rouge">PCollection</code>s. However, using side outputs makes more sense if the transformâs computation per element is time-consuming.</p> +<p>You can use either mechanism to produce multiple output <code class="highlighter-rouge">PCollection</code>s. However, using additional outputs makes more sense if the transformâs computation per element is time-consuming.</p> <h2 id="merging-pcollections">Merging PCollections</h2> http://git-wip-us.apache.org/repos/asf/beam-site/blob/ef362b54/content/documentation/pipelines/test-your-pipeline/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/pipelines/test-your-pipeline/index.html b/content/documentation/pipelines/test-your-pipeline/index.html index 112700b..22ed9f0 100644 --- a/content/documentation/pipelines/test-your-pipeline/index.html +++ b/content/documentation/pipelines/test-your-pipeline/index.html @@ -157,7 +157,8 @@ <li><a href="#testing-individual-dofn-objects" id="markdown-toc-testing-individual-dofn-objects">Testing Individual DoFn Objects</a> <ul> <li><a href="#creating-a-dofntester" id="markdown-toc-creating-a-dofntester">Creating a DoFnTester</a></li> <li><a href="#creating-test-inputs" id="markdown-toc-creating-test-inputs">Creating Test Inputs</a> <ul> - <li><a href="#side-inputs-and-outputs" id="markdown-toc-side-inputs-and-outputs">Side Inputs and Outputs</a></li> + <li><a href="#side-inputs" id="markdown-toc-side-inputs">Side Inputs</a></li> + <li><a href="#additional-outputs" id="markdown-toc-additional-outputs">Additional Outputs</a></li> </ul> </li> <li><a href="#processing-test-inputs-and-checking-results" id="markdown-toc-processing-test-inputs-and-checking-results">Processing Test Inputs and Checking Results</a></li> @@ -204,7 +205,7 @@ <ol> <li>Create a <code class="highlighter-rouge">DoFnTester</code>. Youâll need to pass an instance of the <code class="highlighter-rouge">DoFn</code> you want to test to the static factory method for <code class="highlighter-rouge">DoFnTester</code>.</li> - <li>Create one or more main test inputs of the appropriate type for your <code class="highlighter-rouge">DoFn</code>. If your <code class="highlighter-rouge">DoFn</code> takes side inputs and/or produces side outputs, you should also create the side inputs and the side output tags.</li> + <li>Create one or more main test inputs of the appropriate type for your <code class="highlighter-rouge">DoFn</code>. If your <code class="highlighter-rouge">DoFn</code> takes side inputs and/or produces <a href="/documentation/programming-guide#transforms-outputs">multiple outputs</a>, you should also create the side inputs and the output tags.</li> <li>Call <code class="highlighter-rouge">DoFnTester.processBundle</code> to process the main inputs.</li> <li>Use JUnitâs <code class="highlighter-rouge">Assert.assertThat</code> method to ensure the test outputs returned from <code class="highlighter-rouge">processBundle</code> match your expected values.</li> </ol> @@ -232,7 +233,7 @@ </code></pre> </div> -<h4 id="side-inputs-and-outputs">Side Inputs and Outputs</h4> +<h4 id="side-inputs">Side Inputs</h4> <p>If your <code class="highlighter-rouge">DoFn</code> accepts side inputs, you can create those side inputs by using the method <code class="highlighter-rouge">DoFnTester.setSideInputs</code>.</p> @@ -246,9 +247,18 @@ </code></pre> </div> -<p>If your <code class="highlighter-rouge">DoFn</code> produces side outputs, youâll need to set the appropriate <code class="highlighter-rouge">TupleTag</code> objects that youâll use to access each output. A <code class="highlighter-rouge">DoFn</code> with side outputs produces a <code class="highlighter-rouge">PCollectionTuple</code> for each side output; youâll need to provide a <code class="highlighter-rouge">TupleTagList</code> that corresponds to each side output in that tuple.</p> +<p>See the <code class="highlighter-rouge">ParDo</code> documentation on <a href="/documentation/programming-guide/#transforms-sideio">side inputs</a> for more information.</p> + +<h4 id="additional-outputs">Additional Outputs</h4> -<p>Suppose your <code class="highlighter-rouge">DoFn</code> produces side outputs of type <code class="highlighter-rouge">String</code> and <code class="highlighter-rouge">Integer</code>. You create <code class="highlighter-rouge">TupleTag</code> objects for each, and bundle them into a <code class="highlighter-rouge">TupleTagList</code>, then set it for the <code class="highlighter-rouge">DoFnTester</code> as follows:</p> +<p>If your <code class="highlighter-rouge">DoFn</code> produces multiple output <code class="highlighter-rouge">PCollection</code>s, youâll need to set the +appropriate <code class="highlighter-rouge">TupleTag</code> objects that youâll use to access each output. A <code class="highlighter-rouge">DoFn</code> +with multiple outputs produces a <code class="highlighter-rouge">PCollectionTuple</code> for each output; youâll need +to provide a <code class="highlighter-rouge">TupleTagList</code> that corresponds to each output in that tuple.</p> + +<p>Suppose your <code class="highlighter-rouge">DoFn</code> produces outputs of type <code class="highlighter-rouge">String</code> and <code class="highlighter-rouge">Integer</code>. You create +<code class="highlighter-rouge">TupleTag</code> objects for each, and bundle them into a <code class="highlighter-rouge">TupleTagList</code>, then set it +for the <code class="highlighter-rouge">DoFnTester</code> as follows:</p> <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">MyDoFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">></span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span> <span class="n">MyDoFn</span> <span class="n">myDoFn</span> <span class="o">=</span> <span class="o">...;</span> @@ -258,11 +268,11 @@ <span class="n">TupleTag</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="n">tag2</span> <span class="o">=</span> <span class="o">...;</span> <span class="n">TupleTagList</span> <span class="n">tags</span> <span class="o">=</span> <span class="n">TupleTagList</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">tag1</span><span class="o">).</span><span class="na">and</span><span class="o">(</span><span class="n">tag2</span><span class="o">);</span> -<span class="n">fnTester</span><span class="o">.</span><span class="na">setSideOutputTags</span><span class="o">(</span><span class="n">tags</span><span class="o">);</span> +<span class="n">fnTester</span><span class="o">.</span><span class="na">setOutputTags</span><span class="o">(</span><span class="n">tags</span><span class="o">);</span> </code></pre> </div> -<p>See the <code class="highlighter-rouge">ParDo</code> documentation on <a href="/documentation/programming-guide/#transforms-sideio">side inputs</a> for more information.</p> +<p>See the <code class="highlighter-rouge">ParDo</code> documentation on <a href="/documentation/programming-guide/#transforms-outputs">additional outputs</a> for more information.</p> <h3 id="processing-test-inputs-and-checking-results">Processing Test Inputs and Checking Results</h3> http://git-wip-us.apache.org/repos/asf/beam-site/blob/ef362b54/content/documentation/programming-guide/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 9d0a3b6..8e56108 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -189,7 +189,8 @@ <li><a href="#transforms-combine">Using Combine</a></li> <li><a href="#transforms-flatten-partition">Using Flatten and Partition</a></li> <li><a href="#transforms-usercodereqs">General Requirements for Writing User Code for Beam Transforms</a></li> - <li><a href="#transforms-sideio">Side Inputs and Side Outputs</a></li> + <li><a href="#transforms-sideio">Side Inputs</a></li> + <li><a href="#transforms-outputs">Additional Outputs</a></li> </ul> </li> <li><a href="#transforms-composite">Composite Transforms</a></li> @@ -981,9 +982,7 @@ tree, [2] <p>Itâs recommended that you make your function object idempotentâthat is, that it can be repeated or retried as often as necessary without causing unintended side effects. The Beam model provides no guarantees as to the number of times your user code might be invoked or retried; as such, keeping your function object idempotent keeps your pipelineâs output deterministic, and your transformsâ behavior more predictable and easier to debug.</p> -<h4 id="a-nametransforms-sideioaside-inputs-and-side-outputs"><a name="transforms-sideio"></a>Side Inputs and Side Outputs</h4> - -<h5 id="side-inputs"><strong>Side inputs</strong></h5> +<h4 id="a-nametransforms-sideioaside-inputs"><a name="transforms-sideio"></a>Side Inputs</h4> <p>In addition to the main input <code class="highlighter-rouge">PCollection</code>, you can provide additional inputs to a <code class="highlighter-rouge">ParDo</code> transform in the form of side inputs. A side input is an additional input that your <code class="highlighter-rouge">DoFn</code> can access each time it processes an element in the input <code class="highlighter-rouge">PCollection</code>. When you specify a side input, you create a view of some other data that can be read from within the <code class="highlighter-rouge">ParDo</code> transformâs <code class="highlighter-rouge">DoFn</code> while procesing each element.</p> @@ -1076,50 +1075,50 @@ tree, [2] <p>If the side input has multiple trigger firings, Beam uses the value from the latest trigger firing. This is particularly useful if you use a side input with a single global window and specify a trigger.</p> -<h5 id="side-outputs"><strong>Side outputs</strong></h5> +<h4 id="a-nametransforms-outputsaadditional-outputs"><a name="transforms-outputs"></a>Additional Outputs</h4> -<p>While <code class="highlighter-rouge">ParDo</code> always produces a main output <code class="highlighter-rouge">PCollection</code> (as the return value from apply), you can also have your <code class="highlighter-rouge">ParDo</code> produce any number of additional output <code class="highlighter-rouge">PCollection</code>s. If you choose to have multiple outputs, your <code class="highlighter-rouge">ParDo</code> returns all of the output <code class="highlighter-rouge">PCollection</code>s (including the main output) bundled together.</p> +<p>While <code class="highlighter-rouge">ParDo</code> always produces a main output <code class="highlighter-rouge">PCollection</code> (as the return value from <code class="highlighter-rouge">apply</code>), you can also have your <code class="highlighter-rouge">ParDo</code> produce any number of additional output <code class="highlighter-rouge">PCollection</code>s. If you choose to have multiple outputs, your <code class="highlighter-rouge">ParDo</code> returns all of the output <code class="highlighter-rouge">PCollection</code>s (including the main output) bundled together.</p> -<h5 id="tags-for-side-outputs">Tags for side outputs:</h5> +<h5 id="tags-for-muitiple-outputs">Tags for muitiple outputs:</h5> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// To emit elements to a side output PCollection, create a TupleTag object to identify each collection that your ParDo produces.</span> -<span class="c1">// For example, if your ParDo produces three output PCollections (the main output and two side outputs), you must create three TupleTags.</span> -<span class="c1">// The following example code shows how to create TupleTags for a ParDo with a main output and two side outputs:</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// To emit elements to multiple output PCollections, create a TupleTag object to identify each collection that your ParDo produces.</span> +<span class="c1">// For example, if your ParDo produces three output PCollections (the main output and two additional outputs), you must create three TupleTags.</span> +<span class="c1">// The following example code shows how to create TupleTags for a ParDo with three output PCollections.</span> <span class="c1">// Input PCollection to our ParDo.</span> <span class="n">PCollection</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">words</span> <span class="o">=</span> <span class="o">...;</span> <span class="c1">// The ParDo will filter words whose length is below a cutoff and add them to</span> <span class="c1">// the main ouput PCollection<String>.</span> - <span class="c1">// If a word is above the cutoff, the ParDo will add the word length to a side output</span> - <span class="c1">// PCollection<Integer>.</span> - <span class="c1">// If a word starts with the string "MARKER", the ParDo will add that word to a different</span> - <span class="c1">// side output PCollection<String>.</span> + <span class="c1">// If a word is above the cutoff, the ParDo will add the word length to an</span> + <span class="c1">// output PCollection<Integer>.</span> + <span class="c1">// If a word starts with the string "MARKER", the ParDo will add that word to an</span> + <span class="c1">// output PCollection<String>.</span> <span class="kd">final</span> <span class="kt">int</span> <span class="n">wordLengthCutOff</span> <span class="o">=</span> <span class="mi">10</span><span class="o">;</span> - <span class="c1">// Create the TupleTags for the main and side outputs.</span> - <span class="c1">// Main output.</span> + <span class="c1">// Create three TupleTags, one for each output PCollection.</span> + <span class="c1">// Output that contains words below the length cutoff.</span> <span class="kd">final</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">wordsBelowCutOffTag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">>(){};</span> - <span class="c1">// Word lengths side output.</span> + <span class="c1">// Output that contains word lengths.</span> <span class="kd">final</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">Integer</span><span class="o">></span> <span class="n">wordLengthsAboveCutOffTag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">Integer</span><span class="o">>(){};</span> - <span class="c1">// "MARKER" words side output.</span> + <span class="c1">// Output that contains "MARKER" words.</span> <span class="kd">final</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">markedWordsTag</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TupleTag</span><span class="o"><</span><span class="n">String</span><span class="o">>(){};</span> <span class="c1">// Passing Output Tags to ParDo:</span> <span class="c1">// After you specify the TupleTags for each of your ParDo outputs, pass the tags to your ParDo by invoking .withOutputTags.</span> -<span class="c1">// You pass the tag for the main output first, and then the tags for any side outputs in a TupleTagList.</span> -<span class="c1">// Building on our previous example, we pass the three TupleTags (one for the main output and two for the side outputs) to our ParDo.</span> -<span class="c1">// Note that all of the outputs (including the main output PCollection) are bundled into the returned PCollectionTuple.</span> +<span class="c1">// You pass the tag for the main output first, and then the tags for any additional outputs in a TupleTagList.</span> +<span class="c1">// Building on our previous example, we pass the three TupleTags for our three output PCollections</span> +<span class="c1">// to our ParDo. Note that all of the outputs (including the main output PCollection) are bundled into the returned PCollectionTuple.</span> <span class="n">PCollectionTuple</span> <span class="n">results</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span> <span class="n">ParDo</span> - <span class="c1">// Specify the tag for the main output, wordsBelowCutoffTag.</span> + <span class="c1">// Specify the tag for the main output.</span> <span class="o">.</span><span class="na">withOutputTags</span><span class="o">(</span><span class="n">wordsBelowCutOffTag</span><span class="o">,</span> - <span class="c1">// Specify the tags for the two side outputs as a TupleTagList.</span> + <span class="c1">// Specify the tags for the two additional outputs as a TupleTagList.</span> <span class="n">TupleTagList</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">wordLengthsAboveCutOffTag</span><span class="o">)</span> <span class="o">.</span><span class="na">and</span><span class="o">(</span><span class="n">markedWordsTag</span><span class="o">))</span> <span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">>()</span> <span class="o">{</span> @@ -1152,29 +1151,29 @@ tree, [2] </code></pre> </div> -<h5 id="emitting-to-side-outputs-in-your-dofn">Emitting to side outputs in your DoFn:</h5> +<h5 id="emitting-to-multiple-outputs-in-your-dofn">Emitting to multiple outputs in your DoFn:</h5> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Inside your ParDo's DoFn, you can emit an element to a side output by using the method ProcessContext.sideOutput.</span> -<span class="c1">// Pass the appropriate TupleTag for the target side output collection when you call ProcessContext.sideOutput.</span> -<span class="c1">// After your ParDo, extract the resulting main and side output PCollections from the returned PCollectionTuple.</span> -<span class="c1">// Based on the previous example, this shows the DoFn emitting to the main and side outputs.</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Inside your ParDo's DoFn, you can emit an element to a specific output PCollection by passing in the</span> +<span class="c1">// appropriate TupleTag when you call ProcessContext.output.</span> +<span class="c1">// After your ParDo, extract the resulting output PCollections from the returned PCollectionTuple.</span> +<span class="c1">// Based on the previous example, this shows the DoFn emitting to the main output and two additional outputs.</span> <span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">>()</span> <span class="o">{</span> <span class="kd">public</span> <span class="kt">void</span> <span class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> <span class="n">String</span> <span class="n">word</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">();</span> <span class="k">if</span> <span class="o">(</span><span class="n">word</span><span class="o">.</span><span class="na">length</span><span class="o">()</span> <span class="o"><=</span> <span class="n">wordLengthCutOff</span><span class="o">)</span> <span class="o">{</span> - <span class="c1">// Emit this short word to the main output.</span> + <span class="c1">// Emit short word to the main output.</span> + <span class="c1">// In this example, it is the output with tag wordsBelowCutOffTag.</span> <span class="n">c</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">word</span><span class="o">);</span> <span class="o">}</span> <span class="k">else</span> <span class="o">{</span> - <span class="c1">// Emit this long word's length to a side output.</span> - <span class="n">c</span><span class="o">.</span><span class="na">sideOutput</span><span class="o">(</span><span class="n">wordLengthsAboveCutOffTag</span><span class="o">,</span> <span class="n">word</span><span class="o">.</span><span class="na">length</span><span class="o">());</span> + <span class="c1">// Emit long word length to the output with tag wordLengthsAboveCutOffTag.</span> + <span class="n">c</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">wordLengthsAboveCutOffTag</span><span class="o">,</span> <span class="n">word</span><span class="o">.</span><span class="na">length</span><span class="o">());</span> <span class="o">}</span> <span class="k">if</span> <span class="o">(</span><span class="n">word</span><span class="o">.</span><span class="na">startsWith</span><span class="o">(</span><span class="s">"MARKER"</span><span class="o">))</span> <span class="o">{</span> - <span class="c1">// Emit this word to a different side output.</span> - <span class="n">c</span><span class="o">.</span><span class="na">sideOutput</span><span class="o">(</span><span class="n">markedWordsTag</span><span class="o">,</span> <span class="n">word</span><span class="o">);</span> + <span class="c1">// Emit word to the output with tag markedWordsTag.</span> + <span class="n">c</span><span class="o">.</span><span class="na">output</span><span class="o">(</span><span class="n">markedWordsTag</span><span class="o">,</span> <span class="n">word</span><span class="o">);</span> <span class="o">}</span> <span class="o">}}));</span> - </code></pre> </div> http://git-wip-us.apache.org/repos/asf/beam-site/blob/ef362b54/content/documentation/runners/capability-matrix/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/runners/capability-matrix/index.html b/content/documentation/runners/capability-matrix/index.html index b1cca24..94cc0e2 100644 --- a/content/documentation/runners/capability-matrix/index.html +++ b/content/documentation/runners/capability-matrix/index.html @@ -1332,7 +1332,7 @@ - <td width="25%" class="cap" style="background-color:#fe5;border-color:#ca1"><center><b>Partially: user-provided metrics</b></center><br />Allow transforms to aggregate simple metrics across bundles in a <tt>DoFn</tt>. Semantically equivalent to using a side output, but support partial results as the transform executes. Will likely want to augment <tt>Aggregators</tt> to be more useful for processing unbounded data by making them windowed. + <td width="25%" class="cap" style="background-color:#fe5;border-color:#ca1"><center><b>Partially: user-provided metrics</b></center><br />Allow transforms to aggregate simple metrics across bundles in a <tt>DoFn</tt>. Semantically equivalent to using an additional output, but support partial results as the transform executes. Will likely want to augment <tt>Aggregators</tt> to be more useful for processing unbounded data by making them windowed. </td> http://git-wip-us.apache.org/repos/asf/beam-site/blob/ef362b54/content/images/design-your-pipeline-side-outputs.png ---------------------------------------------------------------------- diff --git a/content/images/design-your-pipeline-side-outputs.png b/content/images/design-your-pipeline-side-outputs.png deleted file mode 100644 index f13989d..0000000 Binary files a/content/images/design-your-pipeline-side-outputs.png and /dev/null differ
