Regenerate website
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/00c736ce Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/00c736ce Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/00c736ce Branch: refs/heads/asf-site Commit: 00c736ce6dd742d2673007d68edb6920e9795f9f Parents: 303fb26 Author: Davor Bonaci <da...@google.com> Authored: Tue Dec 27 17:54:19 2016 -0800 Committer: Davor Bonaci <da...@google.com> Committed: Tue Dec 27 17:54:19 2016 -0800 ---------------------------------------------------------------------- content/documentation/index.html | 14 +++- content/documentation/resources/index.html | 2 +- content/documentation/sdks/java/index.html | 32 +++++++- content/documentation/sdks/python/index.html | 2 - .../mobile-gaming-example/index.html | 37 +++++---- .../get-started/wordcount-example/index.html | 77 ++++++------------- content/images/gaming-example-basic.png | Bin 0 -> 63121 bytes 7 files changed, 88 insertions(+), 76 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/index.html b/content/documentation/index.html index 1259a66..4767f70 100644 --- a/content/documentation/index.html +++ b/content/documentation/index.html @@ -146,7 +146,7 @@ <div class="row"> <h1 id="apache-beam-documentation">Apache Beam Documentation</h1> -<p>Get in-depth conceptual information and reference material for the Beam Model, SDKs and Runners:</p> +<p>This section provides in-depth conceptual information and reference material for the Beam Model, SDKs, and Runners:</p> <h2 id="concepts">Concepts</h2> @@ -157,6 +157,14 @@ <li>Visit <a href="/documentation/resources/">Additional Resources</a> for some of our favorite articles and talks about Beam.</li> </ul> +<h2 id="pipeline-fundamentals">Pipeline Fundamentals</h2> + +<ul> + <li><a href="/documentation/pipelines/design-your-pipeline/">Design Your Pipeline</a> by planning your pipelineâs structure, choosing transforms to apply to your data, and determining your input and output methods.</li> + <li><a href="/documentation/pipelines/create-your-pipeline/">Create Your Pipeline</a> using the classes in the Beam SDKs.</li> + <li><a href="/documentation/pipelines/test-your-pipeline/">Test Your Pipeline</a> to minimize debugging a pipelineâs remote execution.</li> +</ul> + <h2 id="sdks">SDKs</h2> <p>Find status and reference information on all of the available Beam SDKs.</p> @@ -183,9 +191,9 @@ <h3 id="choosing-a-runner">Choosing a Runner</h3> -<p>Beam is designed to enable pipelines to be portable across different runners. However, given every runner has different capabilities, they also have different abilities to implement the core concepts in the Beam model. The <a href="/documentation/runners/capability-matrix">Capability Matrix</a> provides a detailed comparison of runner functionality.</p> +<p>Beam is designed to enable pipelines to be portable across different runners. However, given every runner has different capabilities, they also have different abilities to implement the core concepts in the Beam model. The <a href="/documentation/runners/capability-matrix/">Capability Matrix</a> provides a detailed comparison of runner functionality.</p> -<p>Once you have chosen which runner to use, see that runnerâs page for more information about any initial runner-specific setup as well as any required or optional <code class="highlighter-rouge">PipelineOptions</code> for configuring itâs execution. You may also want to refer back to the <a href="/get-started/quickstart">Quickstart</a> for instructions on executing the sample WordCount pipeline.</p> +<p>Once you have chosen which runner to use, see that runnerâs page for more information about any initial runner-specific setup as well as any required or optional <code class="highlighter-rouge">PipelineOptions</code> for configuring itâs execution. You may also want to refer back to the <a href="/get-started/quickstart/">Quickstart</a> for instructions on executing the sample WordCount pipeline.</p> </div> http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/resources/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/resources/index.html b/content/documentation/resources/index.html index 21a707f..9aceea2 100644 --- a/content/documentation/resources/index.html +++ b/content/documentation/resources/index.html @@ -187,7 +187,7 @@ <p>Hadoop Summit, San Jose, CA, 2016</p> -<p>Presented by Davor Bonacci, <em>Apache Beam PPMC member</em></p> +<p>Presented by Davor Bonaci, <em>Apache Beam PPMC member</em></p> <iframe width="560" height="315" src="https://www.youtube.com/embed/7DZ8ONmeP5A" frameborder="0" allowfullscreen=""></iframe> <p><br /></p> http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/sdks/java/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/sdks/java/index.html b/content/documentation/sdks/java/index.html index 3b3ead4..bd29d00 100644 --- a/content/documentation/sdks/java/index.html +++ b/content/documentation/sdks/java/index.html @@ -146,11 +146,37 @@ <div class="row"> <h1 id="apache-beam-java-sdk">Apache Beam Java SDK</h1> -<p>This page is under construction (<a href="https://issues.apache.org/jira/browse/BEAM-504">BEAM-504</a>).</p> +<p>The Java SDK for Apache Beam provides a simple, powerful API for building both batch and streaming parallel data processing pipelines in Java.</p> -<p>Get started with the <a href="/learn/programming-guide">Beam Programming Guide</a> to learn the basic concepts that hold for all SDKs in the Beam Model.</p> +<h2 id="get-started-with-the-java-sdk">Get Started with the Java SDK</h2> + +<p>Get started with the <a href="/learn/programming-guide/">Beam Programming Model</a> to learn the basic concepts that apply to all SDKs in Beam.</p> + +<p>See the <a href="/learn/sdks/javadoc/">Java API Reference</a> for more information on individual APIs.</p> + +<h2 id="supported-features">Supported Features</h2> + +<p>The Java SDK supports all features currently supported by the Beam model.</p> + +<h2 id="supported-io-connectors">Supported IO Connectors</h2> + +<ul> + <li>Amazon Kinesis</li> + <li>Apache Hadoopâs <code class="highlighter-rouge">FileInputFormat</code> in Hadoop Distributed File System (HDFS)</li> + <li>Apache Kafka</li> + <li>Avro Files</li> + <li>Google BigQuery</li> + <li>Google Cloud Bigtable</li> + <li>Google Cloud Datastore</li> + <li>Google Cloud Pub/Sub</li> + <li>Google Cloud Storage</li> + <li>Java Database Connectivity (JDBC)</li> + <li>Java Message Service (JMS)</li> + <li>MongoDB</li> + <li>Text Files</li> + <li>XML Files</li> +</ul> -<p>See the <a href="/learn/sdks/javadoc/">Java API Reference</a>.</p> </div> http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/documentation/sdks/python/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/sdks/python/index.html b/content/documentation/sdks/python/index.html index 5c39a0d..5816dec 100644 --- a/content/documentation/sdks/python/index.html +++ b/content/documentation/sdks/python/index.html @@ -146,8 +146,6 @@ <div class="row"> <h1 id="apache-beam-python-sdk-under-development">Apache Beam Python SDK <em>[Under Development]</em></h1> -<p>This page is under construction (<a href="https://issues.apache.org/jira/browse/BEAM-977">BEAM-977</a>).</p> - <p>The Beam Python SDK is currently under development on a feature branch. Would you like to contribute? See the Beam <a href="/contribute/work-in-progress/#feature-branches">Work in Progress</a> page to find out how you can help!</p> <p>Get started with the <a href="/learn/programming-guide">Beam Programming Guide</a> to learn the basic concepts that hold for all SDKs in the Beam Model.</p> http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/get-started/mobile-gaming-example/index.html ---------------------------------------------------------------------- diff --git a/content/get-started/mobile-gaming-example/index.html b/content/get-started/mobile-gaming-example/index.html index ead7405..839beb6 100644 --- a/content/get-started/mobile-gaming-example/index.html +++ b/content/get-started/mobile-gaming-example/index.html @@ -196,9 +196,16 @@ <li>A timestamp that records when the particular instance of play happenedâthis is the event time for each game data event.</li> </ul> -<p>When the user completes an instance of the game, their phone sends the data event to a game server, where the data is logged and stored in a file. Generally the data is sent to the game server immediately upon completion. However, users can play the game âofflineâ, when their phones are out of contact with the server (such as on an airplane, or outside network coverage area). When the userâs phone comes back into contact with the game server, the phone will send all accumulated game data.</p> +<p>When the user completes an instance of the game, their phone sends the data event to a game server, where the data is logged and stored in a file. Generally the data is sent to the game server immediately upon completion. However, sometimes delays happen in the network or users play the game âofflineâ, when their phones are out of contact with the server (such as on an airplane, or outside network coverage area). When the userâs phone comes back into contact with the game server, the phone will send all accumulated game data. This means that some data events may arrive delayed and out of order.</p> -<p>This means that some data events might be received by the game server significantly later than users generate them. This time difference can have processing implications for pipelines that make calculations that consider when each score was generated. Such pipelines might track scores generated during each hour of a day, for example, or they calculate the length of time that users are continuously playing the gameâboth of which depend on each data recordâs event time.</p> +<p>The following diagram shows the ideal situation vs reality. The X-axis represents event time: the actual time a game event occurred. The Y-axis represents processing time: the time at which a game event was processed. Ideally, events should be processed as they occur, depicted by the dotted line in the diagram. However, in reality that is not the case and reality looks more like what is depicted by the red squiggly line.</p> + +<figure id="fig1"> + <img src="/images/gaming-example-basic.png" width="264" height="260" alt="Score data for three users." /> +</figure> +<p>Figure 1: Ideally, events are processed when they occur, with no delays.</p> + +<p>The data events might be received by the game server significantly later than users generate them. This time difference (called <strong>skew</strong>) can have processing implications for pipelines that make calculations that consider when each score was generated. Such pipelines might track scores generated during each hour of a day, for example, or they calculate the length of time that users are continuously playing the gameâboth of which depend on each data recordâs event time.</p> <p>Because some of our example pipelines use data files (like logs from the game server) as input, the event timestamp for each game might be embedded in the dataâthat is, itâs a field in each data record. Those pipelines need to parse the event timestamp from each data record after reading it from the input file.</p> @@ -234,12 +241,12 @@ <li>Write the result data to a <a href="https://cloud.google.com/bigquery/">Google Cloud BigQuery</a> table.</li> </ol> -<p>The following diagram shows score data for a several users over the pipeline analysis period. In the diagram, each data point is an event that results in one user/score pair:</p> +<p>The following diagram shows score data for several users over the pipeline analysis period. In the diagram, each data point is an event that results in one user/score pair:</p> -<figure id="fig1"> +<figure id="fig2"> <img src="/images/gaming-example.gif" width="900" height="263" alt="Score data for three users." /> </figure> -<p>Figure 1: Score data for three users.</p> +<p>Figure 2: Score data for three users.</p> <p>This example uses batch processing, and the diagramâs Y axis represents processing time: the pipeline processes events lower on the Y-axis first, and events higher up the axis later. The diagramâs X axis represents the event time for each game event, as denoted by that eventâs timestamp. Note that the individual events in the diagram are not processed by the pipeline in the same order as they occurred (according to their timestamps).</p> @@ -355,10 +362,10 @@ <p>The following diagram shows how the pipeline processes a dayâs worth of a single teamâs scoring data after applying fixed-time windowing:</p> -<figure id="fig2"> +<figure id="fig3"> <img src="/images/gaming-example-team-scores-narrow.gif" width="900" height="390" alt="Score data for two teams." /> </figure> -<p>Figure 2: Score data for two teams. Each teamâs scores are divided into logical windows based on when those scores occurred in event time.</p> +<p>Figure 3: Score data for two teams. Each teamâs scores are divided into logical windows based on when those scores occurred in event time.</p> <p>Notice that as processing time advances, the sums are now <em>per window</em>; each window represents an hour of <em>event time</em> during the day in which the scores occurred.</p> @@ -493,12 +500,12 @@ <p>Because we want all the data that has arrived in the pipeline every time we update our calculation, we have the pipeline consider all of the user score data in a <strong>single global window</strong>. The single global window is unbounded, but we can specify a kind of temporary cut-off point for each ten-minute calculation by using a processing time <a href="/documentation/programming-guide/#triggers">trigger</a>.</p> -<p>When we specify a ten-minute processing time trigger for the single global window, the effect is that the pipeline effectively takes a âsnapshotâ of the contents of the window every time the trigger fires (which it does at ten-minute intervals). Since weâre using a single global window, each snapshot contains all the data collected <em>to that point in time</em>. The following diagram shows the effects of using a processing time trigger on the single global window:</p> +<p>When we specify a ten-minute processing time trigger for the single global window, the pipeline effectively takes a âsnapshotâ of the contents of the window every time the trigger fires. This snapshot happens at ten-minute intervals as long as data has arrived. If no data has arrived, the pipeline will take its next âsnapshotâ 10 minutes past an element arriving. Since weâre using a single global window, each snapshot contains all the data collected <em>to that point in time</em>. The following diagram shows the effects of using a processing time trigger on the single global window:</p> -<figure id="fig3"> +<figure id="fig4"> <img src="/images/gaming-example-proc-time-narrow.gif" width="900" height="263" alt="Score data for for three users." /> </figure> -<p>Figure 3: Score data for for three users. Each userâs scores are grouped together in a single global window, with a trigger that generates a snapshot for output every ten minutes.</p> +<p>Figure 4: Score data for for three users. Each userâs scores are grouped together in a single global window, with a trigger that generates a snapshot for output every ten minutes.</p> <p>As processing time advances and more scores are processed, the trigger outputs the updated sum for each user.</p> @@ -547,12 +554,12 @@ <p>The following diagram shows the relationship between ongoing processing time and each scoreâs event time for two teams:</p> -<figure id="fig4"> +<figure id="fig5"> <img src="/images/gaming-example-event-time-narrow.gif" width="900" height="390" alt="Score data by team, windowed by event time." /> </figure> -<p>Figure 4: Score data by team, windowed by event time. A trigger based on processing time causes the window to emit speculative early results and include late results.</p> +<p>Figure 5: Score data by team, windowed by event time. A trigger based on processing time causes the window to emit speculative early results and include late results.</p> -<p>The dotted line in the diagram is the âidealâ <strong>watermark</strong>: Beamâs notion of when all data in a given window can reasonably considered to have arrived. The irregular solid line represents the actual watermark, as determined by the data source.</p> +<p>The dotted line in the diagram is the âidealâ <strong>watermark</strong>: Beamâs notion of when all data in a given window can reasonably be considered to have arrived. The irregular solid line represents the actual watermark, as determined by the data source.</p> <p>Data arriving above the solid watermark line is <em>late data</em>âthis is a score event that was delayed (perhaps generated offline) and arrived after the window to which it belongs had closed. Our pipelineâs late-firing trigger ensures that this late data is still included in the sum.</p> @@ -698,10 +705,10 @@ <p>The following diagram shows how data might look when grouped into session windows. Unlike fixed windows, session windows are <em>different for each user</em> and is dependent on each individual userâs play pattern:</p> -<figure id="fig5"> +<figure id="fig6"> <img src="/images/gaming-example-session-windows.png" width="662" height="521" alt="User sessions, with a minimum gap duration." /> </figure> -<p>Figure 5: User sessions, with a minimum gap duration. Note how each user has different sessions, according to how many instances they play and how long their breaks between instances are.</p> +<p>Figure 6: User sessions, with a minimum gap duration. Note how each user has different sessions, according to how many instances they play and how long their breaks between instances are.</p> <p>We can use the session-windowed data to determine the average length of uninterrupted play time for all of our users, as well as the total score they achieve during each session. We can do this in the code by first applying session windows, summing the score per user and session, and then using a transform to calculate the length of each individual session:</p> http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/get-started/wordcount-example/index.html ---------------------------------------------------------------------- diff --git a/content/get-started/wordcount-example/index.html b/content/get-started/wordcount-example/index.html index ab8723f..aadd03e 100644 --- a/content/get-started/wordcount-example/index.html +++ b/content/get-started/wordcount-example/index.html @@ -180,10 +180,6 @@ </li> </ul> -<blockquote> - <p><strong>Note:</strong> This walkthrough is still in progress. Detailed instructions for running the example pipelines across multiple runners are yet to be added. There is an open issue to finish the walkthrough (<a href="https://issues.apache.org/jira/browse/BEAM-664">BEAM-664</a>).</p> -</blockquote> - <p>The WordCount examples demonstrate how to set up a processing pipeline that can read text, tokenize the text lines into individual words, and perform a frequency count on each of those words. The Beam SDKs contain a series of these four successively more detailed WordCount examples that build on each other. The input text for all the examples is a set of Shakespeareâs texts.</p> <p>Each WordCount example introduces different concepts in the Beam programming model. Begin by understanding Minimal WordCount, the simplest of the examples. Once you feel comfortable with the basic principles in building a pipeline, continue on to learn more concepts in the other examples.</p> @@ -250,7 +246,7 @@ <p>Each transform takes some kind of input (data or otherwise), and produces some output data. The input and output data is represented by the SDK class <code class="highlighter-rouge">PCollection</code>. <code class="highlighter-rouge">PCollection</code> is a special class, provided by the Beam SDK, that you can use to represent a data set of virtually any size, including infinite data sets.</p> -<p><img src=â/images/wordcount-pipeline.png alt=âWord Count pipeline diagramââ> +<p><img src="/images/wordcount-pipeline.png" alt="Word Count pipeline diagram" /> Figure 1: The pipeline data flow.</p> <p>The Minimal WordCount pipeline contains five transforms:</p> @@ -305,7 +301,7 @@ Figure 1: The pipeline data flow.</p> <li> <p>A text file <code class="highlighter-rouge">Write</code>. This transform takes the final <code class="highlighter-rouge">PCollection</code> of formatted Strings as input and writes each element to an output text file. Each element in the input <code class="highlighter-rouge">PCollection</code> represents one line of text in the resulting output file.</p> - <div class="language-java highlighter-rouge"><pre class="highlight"><code> <span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span class="na">Write</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">"gs://YOUR_OUTPUT_BUCKET/AND_OUTPUT_PREFIX"</span><span class="o">));</span> + <div class="language-java highlighter-rouge"><pre class="highlight"><code> <span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span class="na">Write</span><span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="s">"wordcounts"</span><span class="o">));</span> </code></pre> </div> </li> @@ -317,10 +313,12 @@ Figure 1: The pipeline data flow.</p> <p>Run the pipeline by calling the <code class="highlighter-rouge">run</code> method, which sends your pipeline to be executed by the pipeline runner that you specified when you created your pipeline.</p> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">p</span><span class="o">.</span><span class="na">run</span><span class="o">();</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">p</span><span class="o">.</span><span class="na">run</span><span class="o">().</span><span class="na">waitUntilFinish</span><span class="o">();</span> </code></pre> </div> +<p>Note that the <code class="highlighter-rouge">run</code> method is asynchronous. For a blocking execution instead, run your pipeline appending the <code class="highlighter-rouge">waitUntilFinish</code> method.</p> + <h2 id="wordcount-example">WordCount Example</h2> <p>This WordCount example introduces a few recommended programming practices that can make your pipeline easier to read, write, and maintain. While not explicitly required, they can make your pipelineâs execution more flexible, aid in testing your pipeline, and help make your pipelineâs code reusable.</p> @@ -465,7 +463,6 @@ Figure 1: The pipeline data flow.</p> <span class="o">}</span> <span class="o">}</span> <span class="o">}</span> - </code></pre> </div> @@ -527,9 +524,6 @@ Figure 1: The pipeline data flow.</p> <span class="n">Options</span> <span class="n">options</span> <span class="o">=</span> <span class="o">...</span> <span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">options</span><span class="o">);</span> - <span class="cm">/** - * Concept #1: The Beam SDK allows running the same pipeline with a bounded or unbounded input source. - */</span> <span class="n">PCollection</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">input</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">TextIO</span><span class="o">.</span><span class="na">Read</span><span class="o">.</span><span class="na">from</span><span class="o">(</span><span class="n">options</span><span class="o">.</span><span class="na">getInputFile</span><span class="o">()))</span> @@ -540,39 +534,30 @@ Figure 1: The pipeline data flow.</p> <p>Each element in a <code class="highlighter-rouge">PCollection</code> has an associated <strong>timestamp</strong>. The timestamp for each element is initially assigned by the source that creates the <code class="highlighter-rouge">PCollection</code> and can be adjusted by a <code class="highlighter-rouge">DoFn</code>. In this example the input is bounded. For the purpose of the example, the <code class="highlighter-rouge">DoFn</code> method named <code class="highlighter-rouge">AddTimestampsFn</code> (invoked by <code class="highlighter-rouge">ParDo</code>) will set a timestamp for each element in the <code class="highlighter-rouge">PCollection</code>.</p> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="c1">// Concept #2: Add an element timestamp, using an artificial time just to show windowing.</span> -<span class="c1">// See AddTimestampFn for more details on this.</span> -<span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">AddTimestampFn</span><span class="o">()));</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">AddTimestampFn</span><span class="o">()));</span> </code></pre> </div> <p>Below is the code for <code class="highlighter-rouge">AddTimestampFn</code>, a <code class="highlighter-rouge">DoFn</code> invoked by <code class="highlighter-rouge">ParDo</code>, that sets the data element of the timestamp given the element itself. For example, if the elements were log lines, this <code class="highlighter-rouge">ParDo</code> could parse the time out of the log string and set it as the elementâs timestamp. There are no timestamps inherent in the works of Shakespeare, so in this case weâve made up random timestamps just to illustrate the concept. Each line of the input text will get a random associated timestamp sometime in a 2-hour period.</p> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="cm">/** - * Concept #2: A DoFn that sets the data element timestamp. This is a silly method, just for - * this example, for the bounded data case. Imagine that many ghosts of Shakespeare are all - * typing madly at the same time to recreate his masterworks. Each line of the corpus will - * get a random associated timestamp somewhere in a 2-hour period. - */</span> - <span class="kd">static</span> <span class="kd">class</span> <span class="nc">AddTimestampFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="o">{</span> - <span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">Duration</span> <span class="n">RAND_RANGE</span> <span class="o">=</span> <span class="n">Duration</span><span class="o">.</span><span class="na">standardHours</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span> - <span class="kd">private</span> <span class="kd">final</span> <span class="n">Instant</span> <span class="n">minTimestamp</span><span class="o">;</span> - - <span class="n">AddTimestampFn</span><span class="o">()</span> <span class="o">{</span> - <span class="k">this</span><span class="o">.</span><span class="na">minTimestamp</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Instant</span><span class="o">(</span><span class="n">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">());</span> - <span class="o">}</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">static</span> <span class="kd">class</span> <span class="nc">AddTimestampFn</span> <span class="kd">extends</span> <span class="n">DoFn</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">></span> <span class="o">{</span> + <span class="kd">private</span> <span class="kd">static</span> <span class="kd">final</span> <span class="n">Duration</span> <span class="n">RAND_RANGE</span> <span class="o">=</span> <span class="n">Duration</span><span class="o">.</span><span class="na">standardHours</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span> + <span class="kd">private</span> <span class="kd">final</span> <span class="n">Instant</span> <span class="n">minTimestamp</span><span class="o">;</span> - <span class="nd">@ProcessElement</span> - <span class="kd">public</span> <span class="kt">void</span> <span class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> - <span class="c1">// Generate a timestamp that falls somewhere in the past two hours.</span> - <span class="kt">long</span> <span class="n">randMillis</span> <span class="o">=</span> <span class="o">(</span><span class="kt">long</span><span class="o">)</span> <span class="o">(</span><span class="n">Math</span><span class="o">.</span><span class="na">random</span><span class="o">()</span> <span class="o">*</span> <span class="n">RAND_RANGE</span><span class="o">.</span><span class="na">getMillis</span><span class="o">());</span> - <span class="n">Instant</span> <span class="n">randomTimestamp</span> <span class="o">=</span> <span class="n">minTimestamp</span><span class="o">.</span><span class="na">plus</span><span class="o">(</span><span class="n">randMillis</span><span class="o">);</span> - <span class="cm">/** - * Set the data element with that timestamp. - */</span> - <span class="n">c</span><span class="o">.</span><span class="na">outputWithTimestamp</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Instant</span><span class="o">(</span><span class="n">randomTimestamp</span><span class="o">));</span> - <span class="o">}</span> + <span class="n">AddTimestampFn</span><span class="o">()</span> <span class="o">{</span> + <span class="k">this</span><span class="o">.</span><span class="na">minTimestamp</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Instant</span><span class="o">(</span><span class="n">System</span><span class="o">.</span><span class="na">currentTimeMillis</span><span class="o">());</span> + <span class="o">}</span> + + <span class="nd">@ProcessElement</span> + <span class="kd">public</span> <span class="kt">void</span> <span class="nf">processElement</span><span class="o">(</span><span class="n">ProcessContext</span> <span class="n">c</span><span class="o">)</span> <span class="o">{</span> + <span class="c1">// Generate a timestamp that falls somewhere in the past two hours.</span> + <span class="kt">long</span> <span class="n">randMillis</span> <span class="o">=</span> <span class="o">(</span><span class="kt">long</span><span class="o">)</span> <span class="o">(</span><span class="n">Math</span><span class="o">.</span><span class="na">random</span><span class="o">()</span> <span class="o">*</span> <span class="n">RAND_RANGE</span><span class="o">.</span><span class="na">getMillis</span><span class="o">());</span> + <span class="n">Instant</span> <span class="n">randomTimestamp</span> <span class="o">=</span> <span class="n">minTimestamp</span><span class="o">.</span><span class="na">plus</span><span class="o">(</span><span class="n">randMillis</span><span class="o">);</span> + + <span class="c1">// Set the data element with that timestamp.</span> + <span class="n">c</span><span class="o">.</span><span class="na">outputWithTimestamp</span><span class="o">(</span><span class="n">c</span><span class="o">.</span><span class="na">element</span><span class="o">(),</span> <span class="k">new</span> <span class="n">Instant</span><span class="o">(</span><span class="n">randomTimestamp</span><span class="o">));</span> <span class="o">}</span> +<span class="o">}</span> </code></pre> </div> @@ -582,11 +567,7 @@ Figure 1: The pipeline data flow.</p> <p>The <code class="highlighter-rouge">WindowingWordCount</code> example applies fixed-time windowing, wherein each window represents a fixed time interval. The fixed window size for this example defaults to 1 minute (you can change this with a command-line option).</p> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="cm">/** - * Concept #3: Window into fixed windows. The fixed window size for this example defaults to 1 - * minute (you can change this with a command-line option). - */</span> -<span class="n">PCollection</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">windowedWords</span> <span class="o">=</span> <span class="n">input</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">PCollection</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">windowedWords</span> <span class="o">=</span> <span class="n">input</span> <span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">Window</span><span class="o">.<</span><span class="n">String</span><span class="o">></span><span class="n">into</span><span class="o">(</span> <span class="n">FixedWindows</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">Duration</span><span class="o">.</span><span class="na">standardMinutes</span><span class="o">(</span><span class="n">options</span><span class="o">.</span><span class="na">getWindowSize</span><span class="o">()))));</span> </code></pre> @@ -596,11 +577,7 @@ Figure 1: The pipeline data flow.</p> <p>You can reuse existing <code class="highlighter-rouge">PTransform</code>s, that were created for manipulating simple <code class="highlighter-rouge">PCollection</code>s, over windowed <code class="highlighter-rouge">PCollection</code>s as well.</p> -<div class="highlighter-rouge"><pre class="highlight"><code>/** - * Concept #4: Re-use our existing CountWords transform that does not have knowledge of - * windows over a PCollection containing windowed values. - */ -PCollection<KV<String, Long>> wordCounts = windowedWords.apply(new WordCount.CountWords()); +<div class="highlighter-rouge"><pre class="highlight"><code>PCollection<KV<String, Long>> wordCounts = windowedWords.apply(new WordCount.CountWords()); </code></pre> </div> @@ -610,11 +587,7 @@ PCollection<KV<String, Long>> wordCounts = windowedWords.apply(new W <p>In this example, we stream the results to a BigQuery table. The results are then formatted for a BigQuery table, and then written to BigQuery using BigQueryIO.Write.</p> -<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="cm">/** - * Concept #5: Format the results for a BigQuery table, then write to BigQuery. - * The BigQuery output source supports both bounded and unbounded data. - */</span> -<span class="n">wordCounts</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">FormatAsTableRowFn</span><span class="o">()))</span> +<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">wordCounts</span><span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">ParDo</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="k">new</span> <span class="n">FormatAsTableRowFn</span><span class="o">()))</span> <span class="o">.</span><span class="na">apply</span><span class="o">(</span><span class="n">BigQueryIO</span><span class="o">.</span><span class="na">Write</span> <span class="o">.</span><span class="na">to</span><span class="o">(</span><span class="n">getTableReference</span><span class="o">(</span><span class="n">options</span><span class="o">))</span> <span class="o">.</span><span class="na">withSchema</span><span class="o">(</span><span class="n">getSchema</span><span class="o">())</span> http://git-wip-us.apache.org/repos/asf/beam-site/blob/00c736ce/content/images/gaming-example-basic.png ---------------------------------------------------------------------- diff --git a/content/images/gaming-example-basic.png b/content/images/gaming-example-basic.png new file mode 100644 index 0000000..6f48fe5 Binary files /dev/null and b/content/images/gaming-example-basic.png differ