Regenerate website
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/575e4598 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/575e4598 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/575e4598 Branch: refs/heads/asf-site Commit: 575e45987aa00dca16ee562441dd071362c873b1 Parents: 466edb3 Author: Davor Bonaci <da...@google.com> Authored: Wed Feb 15 14:54:18 2017 -0800 Committer: Davor Bonaci <da...@google.com> Committed: Wed Feb 15 14:54:18 2017 -0800 ---------------------------------------------------------------------- content/blog/2017/02/13/stateful-processing.html | 19 +++++++++++++------ .../runners/capability-matrix/index.html | 12 ++++++------ content/feed.xml | 19 +++++++++++++------ 3 files changed, 32 insertions(+), 18 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/575e4598/content/blog/2017/02/13/stateful-processing.html ---------------------------------------------------------------------- diff --git a/content/blog/2017/02/13/stateful-processing.html b/content/blog/2017/02/13/stateful-processing.html index 833b2fd..a936f4b 100644 --- a/content/blog/2017/02/13/stateful-processing.html +++ b/content/blog/2017/02/13/stateful-processing.html @@ -328,7 +328,7 @@ unique and consistent. Before diving into the code for how to do this in a Beam SDK, Iâll go over this example from the level of the model. In pictures, you want to write a transform that maps input to output like this:</p> -<p><img class="center-block" src="/images/blog/stateful-processing/assign-indices.png" alt="Assigning arbitrary but unique indices to each element" width="100" /></p> +<p><img class="center-block" src="/images/blog/stateful-processing/assign-indices.png" alt="Assigning arbitrary but unique indices to each element" width="180" /></p> <p>The order of the elements A, B, C, D, E is arbitrary, hence their assigned indices are arbitrary, but downstream transforms just need to be OK with this. @@ -400,10 +400,17 @@ key+window pairs, like this:</p> keys and windows are independent dimensions)</p> <p>You can provide the opportunity for parallelism by making sure that table has -enough columns, either via many keys in few windows - for example, a globally -windowed stateful computation keyed by user ID - or via many windows over few -keys - for example, a fixed windowed stateful computation over a global key. -Caveat: all Beam runners today parallelize only over the key.</p> +enough columns. You might have many keys and many windows, or you might have +many of just one or the other:</p> + +<ul> + <li>Many keys in few windows, for example a globally windowed stateful computation +keyed by user ID.</li> + <li>Many windows over few keys, for example a fixed windowed stateful computation +over a global key.</li> +</ul> + +<p>Caveat: all Beam runners today parallelize only over the key.</p> <p>Most often your mental model of state can be focused on only a single column of the table, a single key+window pair. Cross-column interactions do not occur @@ -610,7 +617,7 @@ outputs from the <code class="highlighter-rouge">ParDo</code> that will be proce output, then you cannot use a <code class="highlighter-rouge">Filter</code> transform to reduce data volume downstream.</p> <p>Stateful processing lets you address both the latency problem of side inputs -and the cost problem of excessive uninterseting output. Here is the code, using +and the cost problem of excessive uninteresting output. Here is the code, using only features I have already introduced:</p> <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="k">new</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">KV</span><span class="o"><</span><span class="n">UserId</span><span class="o">,</span> <span class="n">Event</span><span class="o">>,</span> <span class="n">KV</span><span class="o"><</span><span class="n">UserId</span><span class="o">,</span> <span class="n">Prediction</span><span class="o">>>()</span> <span class="o">{</span> http://git-wip-us.apache.org/repos/asf/beam-site/blob/575e4598/content/documentation/runners/capability-matrix/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/runners/capability-matrix/index.html b/content/documentation/runners/capability-matrix/index.html index 60f62b2..88da8eb 100644 --- a/content/documentation/runners/capability-matrix/index.html +++ b/content/documentation/runners/capability-matrix/index.html @@ -441,7 +441,7 @@ </tr> <tr class="cap-summary"> - <th class="cap-summary color-capability format-capability" style="color:#ec3">Keyed State</th> + <th class="cap-summary color-capability format-capability" style="color:#ec3">Stateful Processing</th> @@ -1353,7 +1353,7 @@ </tr> <tr class="cap"> - <th class="cap color-capability format-capability" style="color:#ec3">Keyed State</th> + <th class="cap color-capability format-capability" style="color:#ec3">Stateful Processing</th> @@ -1362,22 +1362,22 @@ - <td width="25%" class="cap" style="background-color:#fe5;border-color:#ca1"><center><b>Partially: non-merging windows</b></center><br />Keyed state is fully supported for non-merging windows. + <td width="25%" class="cap" style="background-color:#fe5;border-color:#ca1"><center><b>Partially: non-merging windows</b></center><br />State is supported for non-merging windows. SetState and MapState are not yet supported. </td> - <td width="25%" class="cap" style="background-color:#fe5;border-color:#ca1"><center><b>Partially: streaming, non-merging windows</b></center><br />Keyed state is supported in streaming mode for non-merging windows. + <td width="25%" class="cap" style="background-color:#fe5;border-color:#ca1"><center><b>Partially: streaming, non-merging windows</b></center><br />State is supported in streaming mode for non-merging windows. SetState and MapState are not yet supported. </td> - <td width="25%" class="cap" style="background-color:#ddd;border-color:#ca1"><center><b>No: not implemented</b></center><br />Spark supports keyed state with mapWithState() so support shuold be straight forward. + <td width="25%" class="cap" style="background-color:#ddd;border-color:#ca1"><center><b>No: not implemented</b></center><br />Spark supports per-key state with <tt>mapWithState()</tt> so support should be straightforward. </td> - <td width="25%" class="cap" style="background-color:#ddd;border-color:#ca1"><center><b>No: not implemented</b></center><br />Apex supports keyed state, so adding support for this should be easy. + <td width="25%" class="cap" style="background-color:#ddd;border-color:#ca1"><center><b>No: not implemented</b></center><br />Apex supports per-key state, so adding support for this should be easy. </td> </tr> http://git-wip-us.apache.org/repos/asf/beam-site/blob/575e4598/content/feed.xml ---------------------------------------------------------------------- diff --git a/content/feed.xml b/content/feed.xml index 726cdd0..5c0cd90 100644 --- a/content/feed.xml +++ b/content/feed.xml @@ -182,7 +182,7 @@ unique and consistent. Before diving into the code for how to do this in a Beam SDK, Iâll go over this example from the level of the model. In pictures, you want to write a transform that maps input to output like this:</p> -<p><img class="center-block" src="/images/blog/stateful-processing/assign-indices.png" alt="Assigning arbitrary but unique indices to each element" width="100" /></p> +<p><img class="center-block" src="/images/blog/stateful-processing/assign-indices.png" alt="Assigning arbitrary but unique indices to each element" width="180" /></p> <p>The order of the elements A, B, C, D, E is arbitrary, hence their assigned indices are arbitrary, but downstream transforms just need to be OK with this. @@ -254,10 +254,17 @@ key+window pairs, like this:</p> keys and windows are independent dimensions)</p> <p>You can provide the opportunity for parallelism by making sure that table has -enough columns, either via many keys in few windows - for example, a globally -windowed stateful computation keyed by user ID - or via many windows over few -keys - for example, a fixed windowed stateful computation over a global key. -Caveat: all Beam runners today parallelize only over the key.</p> +enough columns. You might have many keys and many windows, or you might have +many of just one or the other:</p> + +<ul> + <li>Many keys in few windows, for example a globally windowed stateful computation +keyed by user ID.</li> + <li>Many windows over few keys, for example a fixed windowed stateful computation +over a global key.</li> +</ul> + +<p>Caveat: all Beam runners today parallelize only over the key.</p> <p>Most often your mental model of state can be focused on only a single column of the table, a single key+window pair. Cross-column interactions do not occur @@ -464,7 +471,7 @@ outputs from the <code class="highlighter-rouge">ParDo</code& output, then you cannot use a <code class="highlighter-rouge">Filter</code> transform to reduce data volume downstream.</p> <p>Stateful processing lets you address both the latency problem of side inputs -and the cost problem of excessive uninterseting output. Here is the code, using +and the cost problem of excessive uninteresting output. Here is the code, using only features I have already introduced:</p> <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="k">new</span> <span class="n">DoFn</span><span class="o">&lt;</span><span class="n">KV</span><span class="o">&lt;</span><span class="n">UserId</span><span class="o">,</span> <span class="n">Event</span><span class="o">&gt;,</span> <span class="n">KV</span><span class="o">&lt;</span><span class="n">UserId</span><span class="o">,</span> <span class="n">Prediction</span><span class="o">&gt;&gt;()</span> <span class="o">{</span>