Regenerate website
Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/e8cb676b Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/e8cb676b Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/e8cb676b Branch: refs/heads/asf-site Commit: e8cb676b0f5f7c4f531f8ad93700a0951a0c791a Parents: f9eb9fc Author: Davor Bonaci <da...@google.com> Authored: Mon Jan 30 23:08:19 2017 -0800 Committer: Davor Bonaci <da...@google.com> Committed: Mon Jan 30 23:08:19 2017 -0800 ---------------------------------------------------------------------- .../2016/10/11/strata-hadoop-world-and-beam.html | 2 +- content/contribute/work-in-progress/index.html | 6 ------ content/documentation/programming-guide/index.html | 16 ++++++++-------- content/documentation/runners/dataflow/index.html | 2 +- content/documentation/runners/direct/index.html | 4 ++-- content/documentation/runners/flink/index.html | 2 +- content/feed.xml | 2 +- content/get-started/quickstart-py/index.html | 4 ++-- 8 files changed, 16 insertions(+), 22 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html ---------------------------------------------------------------------- diff --git a/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html b/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html index a02c380..defada4 100644 --- a/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html +++ b/content/beam/update/2016/10/11/strata-hadoop-world-and-beam.html @@ -166,7 +166,7 @@ <p>The Data Engineers are looking to Beam as a way to <a href="https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code">future-proof</a>, meaning that code is portable between the various Big Data frameworks. In fact, many of the attendees were still on Hadoop MapReduce and looking to transition to a new framework. Theyâre realizing that continually rewriting code isnât the most productive approach.</p> -<p>Data Scientists are really interested in using Beam. They interested in having a single API for doing analysis instead of several different APIs. We talked about Beamâs progress on the Python API. If you want to take a peek, itâs being actively developed on a <a href="https://github.com/apache/beam/tree/python-sdk">feature branch</a>. As Beam matures, weâre looking to add other supported languages.</p> +<p>Data Scientists are really interested in using Beam. They interested in having a single API for doing analysis instead of several different APIs. We talked about Beamâs progress on the Python API. If you want to take a peek, itâs being actively developed on a <a href="https://github.com/apache/beam/tree/master/sdks/python">feature branch</a>. As Beam matures, weâre looking to add other supported languages.</p> <p>We heard <a href="https://twitter.com/jessetanderson/status/781124173108305920">loud and clear</a> from Beam users that great runner support is crucial to adoption. We have great Apache Flink support. During the conference we had some more volunteers offer their help on the Spark runner.</p> http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/contribute/work-in-progress/index.html ---------------------------------------------------------------------- diff --git a/content/contribute/work-in-progress/index.html b/content/contribute/work-in-progress/index.html index 992cf46..caa00d5 100644 --- a/content/contribute/work-in-progress/index.html +++ b/content/contribute/work-in-progress/index.html @@ -182,12 +182,6 @@ <td><a href="https://github.com/apache/beam/blob/gearpump-runner/runners/gearpump/README.md">README</a></td> </tr> <tr> - <td>Python SDK</td> - <td><a href="https://github.com/apache/beam/tree/python-sdk">python-sdk</a></td> - <td><a href="https://issues.apache.org/jira/browse/BEAM/component/12328910">sdk-py</a></td> - <td><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/README.md">README</a></td> - </tr> - <tr> <td>Apache Spark 2.0 Runner</td> <td><a href="https://github.com/apache/beam/tree/runners-spark2">runners-spark2</a></td> <td>-</td> http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/documentation/programming-guide/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index 80eee5f..bee08bc 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -229,13 +229,13 @@ <h2 id="a-namepipelineacreating-the-pipeline"><a name="pipeline"></a>Creating the pipeline</h2> -<p>The <code class="highlighter-rouge">Pipeline</code> abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/Pipeline.html">Pipeline</a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/pipeline.py">Pipeline</a></span> object, and then using that object as the basis for creating the pipelineâs data sets as <code class="highlighter-rouge">PCollection</code>s and its operations as <code class="highlighter-rouge">Transform</code>s.</p> +<p>The <code class="highlighter-rouge">Pipeline</code> abstraction encapsulates all the data and steps in your data processing task. Your Beam driver program typically starts by constructing a <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/Pipeline.html">Pipeline</a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py">Pipeline</a></span> object, and then using that object as the basis for creating the pipelineâs data sets as <code class="highlighter-rouge">PCollection</code>s and its operations as <code class="highlighter-rouge">Transform</code>s.</p> <p>To use Beam, your driver program must first create an instance of the Beam SDK class <code class="highlighter-rouge">Pipeline</code> (typically in the <code class="highlighter-rouge">main()</code> function). When you create your <code class="highlighter-rouge">Pipeline</code>, youâll also need to set some <strong>configuration options</strong>. You can set your pipelineâs configuration options programatically, but itâs often easier to set the options ahead of time (or read them from the command line) and pass them to the <code class="highlighter-rouge">Pipeline</code> object when you create the object.</p> <p>The pipeline configuration options determine, among other things, the <code class="highlighter-rouge">PipelineRunner</code> that determines where the pipeline gets executed: locally, or using a distributed back-end of your choice. Depending on where your pipeline gets executed and what your specifed Runner requires, the options can also help you specify other aspects of execution.</p> -<p>To set your pipelineâs configuration options and create the pipeline, create an object of type <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/options/PipelineOptions.html">PipelineOptions</a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py">PipelineOptions</a></span> and pass it to <code class="highlighter-rouge">Pipeline.Create()</code>. The most common way to do this is by parsing arguments from the command-line:</p> +<p>To set your pipelineâs configuration options and create the pipeline, create an object of type <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/options/PipelineOptions.html">PipelineOptions</a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py">PipelineOptions</a></span> and pass it to <code class="highlighter-rouge">Pipeline.Create()</code>. The most common way to do this is by parsing arguments from the command-line:</p> <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span> <span class="c1">// Will parse the arguments passed into the application and construct a PipelineOptions</span> @@ -655,7 +655,7 @@ tree, [2] <h4 id="a-nametransforms-combineausing-combine"><a name="transforms-combine"></a>Using Combine</h4> -<p><span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Combine.html"><code class="highlighter-rouge">Combine</code></a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Combine</code></a></span> is a Beam transform for combining collections of elements or values in your data. <code class="highlighter-rouge">Combine</code> has variants that work on entire <code class="highlighter-rouge">PCollection</code>s, and some that combine the values for each key in <code class="highlighter-rouge">PCollection</code>s of key/value pairs.</p> +<p><span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Combine.html"><code class="highlighter-rouge">Combine</code></a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Combine</code></a></span> is a Beam transform for combining collections of elements or values in your data. <code class="highlighter-rouge">Combine</code> has variants that work on entire <code class="highlighter-rouge">PCollection</code>s, and some that combine the values for each key in <code class="highlighter-rouge">PCollection</code>s of key/value pairs.</p> <p>When you apply a <code class="highlighter-rouge">Combine</code> transform, you must provide the function that contains the logic for combining the elements or values. The combining function should be commutative and associative, as the function is not necessarily invoked exactly once on all values with a given key. Because the input data (including the value collection) may be distributed across multiple workers, the combining function might be called multiple times to perform partial combining on subsets of the value collection. The Beam SDK also provides some pre-built combine functions for common numeric combination operations such as sum, min, and max.</p> @@ -852,7 +852,7 @@ tree, [2] <h4 id="a-nametransforms-flatten-partitionausing-flatten-and-partition"><a name="transforms-flatten-partition"></a>Using Flatten and Partition</h4> -<p><span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Flatten.html"><code class="highlighter-rouge">Flatten</code></a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Flatten</code></a></span> and <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Partition.html"><code class="highlighter-rouge">Partition</code></a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Partition</code></a></span> are Beam transforms for <code class="highlighter-rouge">PCollection</code> objects that store the same data type. <code class="highlighter-rouge">Flatten</code> merges multiple <code class="highlighter-rouge">PCollection</code> objects into a singl e logical <code class="highlighter-rouge">PCollection</code>, and <code class="highlighter-rouge">Partition</code> splits a single <code class="highlighter-rouge">PCollection</code> into a fixed number of smaller collections.</p> +<p><span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Flatten.html"><code class="highlighter-rouge">Flatten</code></a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Flatten</code></a></span> and <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Partition.html"><code class="highlighter-rouge">Partition</code></a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Partition</code></a></span> are Beam transforms for <code class="highlighter-rouge">PCollection</code> objects that store the same data type. <code class="highlighter-rouge">Flatten</code> merges multiple <code class="highlighter-rouge">PCollection</code> objects into a single logica l <code class="highlighter-rouge">PCollection</code>, and <code class="highlighter-rouge">Partition</code> splits a single <code class="highlighter-rouge">PCollection</code> into a fixed number of smaller collections.</p> <h5 id="flatten"><strong>Flatten</strong></h5> @@ -1303,14 +1303,14 @@ tree, [2] <tr> <td>Python</td> <td> - <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py">avroio</a></p> - <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/textio.py">textio</a></p> + <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py">avroio</a></p> + <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py">textio</a></p> </td> <td> </td> <td> - <p><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/bigquery.py">Google BigQuery</a></p> - <p><a href="https://github.com/apache/beam/tree/python-sdk/sdks/python/apache_beam/io/datastore">Google Cloud Datastore</a></p> + <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/bigquery.py">Google BigQuery</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/datastore">Google Cloud Datastore</a></p> </td> </tr> http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/documentation/runners/dataflow/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/runners/dataflow/index.html b/content/documentation/runners/dataflow/index.html index 9a0da20..04d1861 100644 --- a/content/documentation/runners/dataflow/index.html +++ b/content/documentation/runners/dataflow/index.html @@ -258,7 +258,7 @@ </tr> </table> -<p>See the reference documentation for the <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.html">DataflowPipelineOptions</a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py">PipelineOptions</a></span> interface (and its subinterfaces) for the complete list of pipeline configuration options.</p> +<p>See the reference documentation for the <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.html">DataflowPipelineOptions</a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py">PipelineOptions</a></span> interface (and its subinterfaces) for the complete list of pipeline configuration options.</p> <h2 id="additional-information-and-caveats">Additional information and caveats</h2> http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/documentation/runners/direct/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/runners/direct/index.html b/content/documentation/runners/direct/index.html index c7a8e2d..631a792 100644 --- a/content/documentation/runners/direct/index.html +++ b/content/documentation/runners/direct/index.html @@ -182,11 +182,11 @@ <p>When executing your pipeline from the command-line, set <code class="highlighter-rouge">runner</code> to <code class="highlighter-rouge">direct</code>. The default values for the other pipeline options are generally sufficient.</p> -<p>See the reference documentation for the <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/runners/direct/DirectOptions.html"><code class="highlighter-rouge">DirectOptions</code></a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py"><code class="highlighter-rouge">PipelineOptions</code></a></span> interface (and its subinterfaces) for defaults and the complete list of pipeline configuration options.</p> +<p>See the reference documentation for the <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/runners/direct/DirectOptions.html"><code class="highlighter-rouge">DirectOptions</code></a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py"><code class="highlighter-rouge">PipelineOptions</code></a></span> interface (and its subinterfaces) for defaults and the complete list of pipeline configuration options.</p> <h2 id="additional-information-and-caveats">Additional information and caveats</h2> -<p>Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Create.html"><code class="highlighter-rouge">Create</code></a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Create</code></a></span> transform, or you can use a <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/io/Read.html"><code class="highlighter-rouge">Read</code></a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py"><code class="highlighter-rouge">Read</code></a></span> transform to wor k with small local or remote files.</p> +<p>Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/transforms/Create.html"><code class="highlighter-rouge">Create</code></a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py"><code class="highlighter-rouge">Create</code></a></span> transform, or you can use a <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/sdk/io/Read.html"><code class="highlighter-rouge">Read</code></a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py"><code class="highlighter-rouge">Read</code></a></span> transform to work with s mall local or remote files.</p> </div> http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/documentation/runners/flink/index.html ---------------------------------------------------------------------- diff --git a/content/documentation/runners/flink/index.html b/content/documentation/runners/flink/index.html index 1e5137b..e70099f 100644 --- a/content/documentation/runners/flink/index.html +++ b/content/documentation/runners/flink/index.html @@ -273,7 +273,7 @@ </tr> </table> -<p>See the reference documentation for the <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html">FlinkPipelineOptions</a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/python-sdk/sdks/python/apache_beam/utils/pipeline_options.py">PipelineOptions</a></span> interface (and its subinterfaces) for the complete list of pipeline configuration options.</p> +<p>See the reference documentation for the <span class="language-java"><a href="/documentation/sdks/javadoc/0.4.0/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html">FlinkPipelineOptions</a></span><span class="language-python"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py">PipelineOptions</a></span> interface (and its subinterfaces) for the complete list of pipeline configuration options.</p> <h2 id="additional-information-and-caveats">Additional information and caveats</h2> http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/feed.xml ---------------------------------------------------------------------- diff --git a/content/feed.xml b/content/feed.xml index bfd68ae..a7e3bf7 100644 --- a/content/feed.xml +++ b/content/feed.xml @@ -441,7 +441,7 @@ Java SDK. If you have questions or comments, weâd love to hear them on the <p>The Data Engineers are looking to Beam as a way to <a href="https://www.oreilly.com/ideas/future-proof-and-scale-proof-your-code">future-proof</a>, meaning that code is portable between the various Big Data frameworks. In fact, many of the attendees were still on Hadoop MapReduce and looking to transition to a new framework. Theyâre realizing that continually rewriting code isnât the most productive approach.</p> -<p>Data Scientists are really interested in using Beam. They interested in having a single API for doing analysis instead of several different APIs. We talked about Beamâs progress on the Python API. If you want to take a peek, itâs being actively developed on a <a href="https://github.com/apache/beam/tree/python-sdk">feature branch</a>. As Beam matures, weâre looking to add other supported languages.</p> +<p>Data Scientists are really interested in using Beam. They interested in having a single API for doing analysis instead of several different APIs. We talked about Beamâs progress on the Python API. If you want to take a peek, itâs being actively developed on a <a href="https://github.com/apache/beam/tree/master/sdks/python">feature branch</a>. As Beam matures, weâre looking to add other supported languages.</p> <p>We heard <a href="https://twitter.com/jessetanderson/status/781124173108305920">loud and clear</a> from Beam users that great runner support is crucial to adoption. We have great Apache Flink support. During the conference we had some more volunteers offer their help on the Spark runner.</p> http://git-wip-us.apache.org/repos/asf/beam-site/blob/e8cb676b/content/get-started/quickstart-py/index.html ---------------------------------------------------------------------- diff --git a/content/get-started/quickstart-py/index.html b/content/get-started/quickstart-py/index.html index 948c41c..9645c74 100644 --- a/content/get-started/quickstart-py/index.html +++ b/content/get-started/quickstart-py/index.html @@ -219,7 +219,7 @@ environmentâs directories.</p> <ol> <li> <p>Clone the Apache Beam repo from GitHub: - <code class="highlighter-rouge">git clone https://github.com/apache/beam.git --branch python-sdk</code></p> + <code class="highlighter-rouge">git clone https://github.com/apache/beam.git</code></p> </li> <li> <p>Navigate to the <code class="highlighter-rouge">python</code> directory: @@ -241,7 +241,7 @@ environmentâs directories.</p> <h2 id="execute-a-pipeline-locally">Execute a pipeline locally</h2> -<p>The Apache Beam <a href="https://github.com/apache/beam/tree/python-sdk/sdks/python/apache_beam/examples">examples</a> directory has many examples. All examples can be run locally by passing the required arguments described in the example script.</p> +<p>The Apache Beam <a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples">examples</a> directory has many examples. All examples can be run locally by passing the required arguments described in the example script.</p> <p>For example, to run <code class="highlighter-rouge">wordcount.py</code>, run:</p>