Repository: beam-site
Updated Branches:
  refs/heads/asf-site 9511ebfe8 -> ddbe5b27a


[BEAM-1741] Update direct and Cloud Dataflow runner pages with Python info


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f23d9cb4
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f23d9cb4
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f23d9cb4

Branch: refs/heads/asf-site
Commit: f23d9cb4de49ec99792546420a16b43047aa7b61
Parents: 9511ebf
Author: melissa <[email protected]>
Authored: Mon Apr 17 17:13:33 2017 -0700
Committer: Davor Bonaci <[email protected]>
Committed: Tue Apr 18 15:44:30 2017 -0700

----------------------------------------------------------------------
 src/documentation/runners/dataflow.md | 79 +++++++++++++++++++++++++-----
 src/documentation/runners/direct.md   | 31 +++++++++---
 2 files changed, 91 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/f23d9cb4/src/documentation/runners/dataflow.md
----------------------------------------------------------------------
diff --git a/src/documentation/runners/dataflow.md 
b/src/documentation/runners/dataflow.md
index f2037a2..5eb6b53 100644
--- a/src/documentation/runners/dataflow.md
+++ b/src/documentation/runners/dataflow.md
@@ -6,6 +6,14 @@ redirect_from: /learn/runners/dataflow/
 ---
 # Using the Google Cloud Dataflow Runner
 
+<nav class="language-switcher">
+  <strong>Adapt for:</strong>
+  <ul>
+    <li data-type="language-java" class="active">Java SDK</li>
+    <li data-type="language-py">Python SDK</li>
+  </ul>
+</nav>
+
 The Google Cloud Dataflow Runner uses the [Cloud Dataflow managed 
service](https://cloud.google.com/dataflow/service/dataflow-service-desc). When 
you run your pipeline with the Cloud Dataflow service, the runner uploads your 
executable code and dependencies to a Google Cloud Storage bucket and creates a 
Cloud Dataflow job, which executes your pipeline on managed resources in Google 
Cloud Platform.
 
 The Cloud Dataflow Runner and service are suitable for large scale, continuous 
jobs, and provide:
@@ -40,8 +48,7 @@ For more information, see the *Before you begin* section of 
the [Cloud Dataflow
 
 ### Specify your dependency
 
-You must specify your dependency on the Cloud Dataflow Runner.
-
+<span class="language-java">When using Java, you must specify your dependency 
on the Cloud Dataflow Runner in your `pom.xml`.</span>
 ```java
 <dependency>
   <groupId>org.apache.beam</groupId>
@@ -51,6 +58,8 @@ You must specify your dependency on the Cloud Dataflow Runner.
 </dependency>
 ```
 
+<span class="language-py">This section is not applicable to the Beam SDK for 
Python.</span>
+
 ### Authentication
 
 Before running your pipeline, you must authenticate with the Google Cloud 
Platform. Run the following command to get [Application Default 
Credentials](https://developers.google.com/identity/protocols/application-default-credentials).
@@ -61,7 +70,8 @@ gcloud auth application-default login
 
 ## Pipeline options for the Cloud Dataflow Runner
 
-When executing your pipeline with the Cloud Dataflow Runner, set these 
pipeline options.
+<span class="language-java">When executing your pipeline with the Cloud 
Dataflow Runner (Java), consider these common pipeline options.</span>
+<span class="language-py">When executing your pipeline with the Cloud Dataflow 
Runner (Python), consider these common pipeline options.</span>
 
 <table class="table table-bordered">
 <tr>
@@ -69,39 +79,80 @@ When executing your pipeline with the Cloud Dataflow 
Runner, set these pipeline
   <th>Description</th>
   <th>Default Value</th>
 </tr>
+
 <tr>
   <td><code>runner</code></td>
   <td>The pipeline runner to use. This option allows you to determine the 
pipeline runner at runtime.</td>
-  <td>Set to <code>dataflow</code> to run on the Cloud Dataflow Service.</td>
+  <td>Set to <code>dataflow</code> or <code>DataflowRunner</code> to run on 
the Cloud Dataflow Service.</td>
 </tr>
+
 <tr>
   <td><code>project</code></td>
   <td>The project ID for your Google Cloud Project.</td>
   <td>If not set, defaults to the default project in the current environment. 
The default project is set via <code>gcloud</code>.</td>
 </tr>
-<tr>
+
+<!-- Only show for Java -->
+<tr class="language-java">
   <td><code>streaming</code></td>
   <td>Whether streaming mode is enabled or disabled; <code>true</code> if 
enabled. Set to <code>true</code> if running pipelines with unbounded 
<code>PCollection</code>s.</td>
   <td><code>false</code></td>
 </tr>
+
 <tr>
-  <td><code>tempLocation</code></td>
-  <td>Optional. Path for temporary files. If set to a valid Google Cloud 
Storage URL that begins with <code>gs://</code>, <code>tempLocation</code> is 
used as the default value for <code>gcpTempLocation</code>.</td>
+  <td>
+    <span class="language-java"><code>tempLocation</code></span>
+    <span class="language-py"><code>temp_location</code></span>
+  </td>
+  <td>
+    <span class="language-java">Optional.</span>
+    <span class="language-py">Required.</span>
+    Path for temporary files. Must be a valid Google Cloud Storage URL that 
begins with <code>gs://</code>.
+    <span class="language-java">If set, <code>tempLocation</code> is used as 
the default value for <code>gcpTempLocation</code>.</span>
+  </td>
   <td>No default value.</td>
 </tr>
-<tr>
+
+<!-- Only show for Java -->
+<tr class="language-java">
   <td><code>gcpTempLocation</code></td>
   <td>Cloud Storage bucket path for temporary files. Must be a valid Cloud 
Storage URL that begins with <code>gs://</code>.</td>
   <td>If not set, defaults to the value of <code>tempLocation</code>, provided 
that <code>tempLocation</code> is a valid Cloud Storage URL. If 
<code>tempLocation</code> is not a valid Cloud Storage URL, you must set 
<code>gcpTempLocation</code>.</td>
 </tr>
+
 <tr>
-  <td><code>stagingLocation</code></td>
+  <td>
+    <span class="language-java"><code>stagingLocation</code></span>
+    <span class="language-py"><code>staging_location</code></span>
+  </td>
   <td>Optional. Cloud Storage bucket path for staging your binary and any 
temporary files. Must be a valid Cloud Storage URL that begins with 
<code>gs://</code>.</td>
-  <td>If not set, defaults to a staging directory within 
<code>gcpTempLocation</code>.</td>
+  <td>
+    <span class="language-java">If not set, defaults to a staging directory 
within <code>gcpTempLocation</code>.</span>
+    <span class="language-py">If not set, defaults to a staging directory 
within <code>temp_location</code>.</span>
+  </td>
 </tr>
+
+<!-- Only show for Python -->
+<tr class="language-py">
+  <td><code>save_main_session</code></td>
+  <td>Save the main session state so that pickled functions and classes 
defined in <code>__main__</code> (e.g. interactive session) can be unpickled. 
Some workflows do not need the session state if, for instance, all of their 
functions/classes are defined in proper modules (not <code>__main__</code>) and 
the modules are importable in the worker.</td>
+  <td><code>false</code></td>
+</tr>
+
+<!-- Only show for Python -->
+<tr class="language-py">
+  <td><code>sdk_location</code></td>
+  <td>Override the default location from where the Beam SDK is downloaded. 
This value can be an URL, a Cloud Storage path, or a local path to an SDK 
tarball. Workflow submissions will download or copy the SDK tarball from this 
location. If set to the string <code>default</code>, a standard SDK location is 
used. If empty, no SDK is copied.</td>
+  <td><code>default</code></td>
+</tr>
+
+
 </table>
 
-See the reference documentation for the  <span 
class="language-java">[DataflowPipelineOptions]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.html)</span><span
 
class="language-python">[PipelineOptions](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py)</span>
 interface (and its subinterfaces) for the complete list of pipeline 
configuration options.
+See the reference documentation for the
+<span class="language-java">[DataflowPipelineOptions]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.html)</span>
+<span class="language-py">[`PipelineOptions`]({{ site.baseurl 
}}/documentation/sdks/pydoc/{{ site.release_latest 
}}/apache_beam.utils.html#apache_beam.utils.pipeline_options.PipelineOptions)</span>
+interface (and any subinterfaces) for additional pipeline configuration 
options.
 
 ## Additional information and caveats
 
@@ -111,8 +162,10 @@ While your pipeline executes, you can monitor the job's 
progress, view details o
 
 ### Blocking Execution
 
-To connect to your job and block until it is completed, call `waitToFinish` on 
the `PipelineResult` returned from `pipeline.run()`. The Cloud Dataflow Runner 
prints job status updates and console messages while it waits. While the result 
is connected to the active job, note that pressing **Ctrl+C** from the command 
line does not cancel your job. To cancel the job, you can use the [Dataflow 
Monitoring 
Interface](https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf)
 or the [Dataflow Command-line 
Interface](https://cloud.google.com/dataflow/pipelines/dataflow-command-line-intf).
+To block until your job completes, call <span 
class="language-java"><code>waitToFinish</code></span><span 
class="language-py"><code>wait_until_finish</code></span> on the 
`PipelineResult` returned from `pipeline.run()`. The Cloud Dataflow Runner 
prints job status updates and console messages while it waits. While the result 
is connected to the active job, note that pressing **Ctrl+C** from the command 
line does not cancel your job. To cancel the job, you can use the [Dataflow 
Monitoring 
Interface](https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf)
 or the [Dataflow Command-line 
Interface](https://cloud.google.com/dataflow/pipelines/dataflow-command-line-intf).
 
 ### Streaming Execution
 
-If your pipeline uses an unbounded data source or sink, you must set the 
`streaming` option to `true`.
+<span class="language-java">If your pipeline uses an unbounded data source or 
sink, you must set the `streaming` option to `true`.</span>
+<span class="language-py">The Beam SDK for Python does not currently support 
streaming pipelines.</span>
+

http://git-wip-us.apache.org/repos/asf/beam-site/blob/f23d9cb4/src/documentation/runners/direct.md
----------------------------------------------------------------------
diff --git a/src/documentation/runners/direct.md 
b/src/documentation/runners/direct.md
index babe4cb..0e01c5c 100644
--- a/src/documentation/runners/direct.md
+++ b/src/documentation/runners/direct.md
@@ -6,6 +6,14 @@ redirect_from: /learn/runners/direct/
 ---
 # Using the Direct Runner
 
+<nav class="language-switcher">
+  <strong>Adapt for:</strong>
+  <ul>
+    <li data-type="language-java" class="active">Java SDK</li>
+    <li data-type="language-py">Python SDK</li>
+  </ul>
+</nav>
+
 The Direct Runner executes pipelines on your machine and is designed to 
validate that pipelines adhere to the Apache Beam model as closely as possible. 
Instead of focusing on efficient pipeline execution, the Direct Runner performs 
additional checks to ensure that users do not rely on semantics that are not 
guaranteed by the model. Some of these checks include:
 
 * enforcing immutability of elements
@@ -16,14 +24,20 @@ The Direct Runner executes pipelines on your machine and is 
designed to validate
 Using the Direct Runner for testing and development helps ensure that 
pipelines are robust across different Beam runners. In addition, debugging 
failed runs can be a non-trivial task when a pipeline executes on a remote 
cluster. Instead, it is often faster and simpler to perform local unit testing 
on your pipeline code. Unit testing your pipeline locally also allows you to 
use your preferred local debugging tools.
 
 Here are some resources with information about how to test your pipelines.
-* [Testing Unbounded Pipelines in Apache Beam]({{ site.baseurl 
}}/blog/2016/10/20/test-stream.html) talks about the use of Java classes 
[`PAssert`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ 
site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html) and 
[`TestStream`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ 
site.release_latest }}/index.html?org/apache/beam/sdk/testing/TestStream.html) 
to test your pipelines.
-* The [Apache Beam WordCount Example]({{ site.baseurl 
}}/get-started/wordcount-example/) contains an example of logging and testing a 
pipeline with [`PAssert`]({{ site.baseurl }}/documentation/sdks/javadoc/{{ 
site.release_latest }}/index.html?org/apache/beam/sdk/testing/PAssert.html).
+<ul>
+  <!-- Java specific links -->
+  <li class="language-java"><a href="{{ site.baseurl 
}}/blog/2016/10/20/test-stream.html">Testing Unbounded Pipelines in Apache 
Beam</a> talks about the use of Java classes <a href="{{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/testing/PAssert.html">PAssert</a> and <a 
href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/testing/TestStream.html">TestStream</a> to 
test your pipelines.</li>
+  <li class="language-java">The <a href="{{ site.baseurl 
}}/get-started/wordcount-example/#testing-your-pipeline-via-passert">Apache 
Beam WordCount Example</a> contains an example of logging and testing a 
pipeline with <a href="{{ site.baseurl }}/documentation/sdks/javadoc/{{ 
site.release_latest 
}}/index.html?org/apache/beam/sdk/testing/PAssert.html"><code>PAssert</code></a>.</li>
 
+  <!-- Python specific links -->
+  <li class="language-py">You can use <a 
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L206";>assert_that</a>
 to test your pipeline. The Python <a 
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_debugging.py";>WordCount
 Debugging Example</a> contains an example of logging and testing with <a 
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L206";><code>assert_that</code></a>.</li>
+</ul>
 
 ## Direct Runner prerequisites and setup
 
-You must specify your dependency on the Direct Runner.
+### Specify your dependency
 
+<span class="language-java">When using Java, you must specify your dependency 
on the Direct Runner in your `pom.xml`.</span>
 ```java
 <dependency>
    <groupId>org.apache.beam</groupId>
@@ -33,13 +47,18 @@ You must specify your dependency on the Direct Runner.
 </dependency>
 ```
 
+<span class="language-py">This section is not applicable to the Beam SDK for 
Python.</span>
+
 ## Pipeline options for the Direct Runner
 
-When executing your pipeline from the command-line, set `runner` to `direct`. 
The default values for the other pipeline options are generally sufficient.
+When executing your pipeline from the command-line, set `runner` to `direct` 
or `DirectRunner`. The default values for the other pipeline options are 
generally sufficient.
 
-See the reference documentation for the  <span 
class="language-java">[`DirectOptions`]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/runners/direct/DirectOptions.html)</span><span 
class="language-python">[`PipelineOptions`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/pipeline_options.py)</span>
 interface (and its subinterfaces) for defaults and the complete list of 
pipeline configuration options.
+See the reference documentation for the
+<span class="language-java">[`DirectOptions`]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/runners/direct/DirectOptions.html)</span>
+<span class="language-py">[`DirectOptions`]({{ site.baseurl 
}}/documentation/sdks/pydoc/{{ site.release_latest 
}}/apache_beam.utils.html#apache_beam.utils.pipeline_options.DirectOptions)</span>
+interface for defaults and additional pipeline configuration options.
 
 ## Additional information and caveats
 
-Local execution is limited by the memory available in your local environment. 
It is highly recommended that you run your pipeline with data sets small enough 
to fit in local memory. You can create a small in-memory data set using a <span 
class="language-java">[`Create`]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/transforms/Create.html)</span><span 
class="language-python">[`Create`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py)</span>
 transform, or you can use a <span class="language-java">[`Read`]({{ 
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/io/Read.html)</span><span 
class="language-python">[`Read`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py)</span>
 transform to work with small local or remote files.
+Local execution is limited by the memory available in your local environment. 
It is highly recommended that you run your pipeline with data sets small enough 
to fit in local memory. You can create a small in-memory data set using a <span 
class="language-java">[`Create`]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/transforms/Create.html)</span><span 
class="language-py">[`Create`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/core.py)</span>
 transform, or you can use a <span class="language-java">[`Read`]({{ 
site.baseurl }}/documentation/sdks/javadoc/{{ site.release_latest 
}}/index.html?org/apache/beam/sdk/io/Read.html)</span><span 
class="language-py">[`Read`](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py)</span>
 transform to work with small local or remote files.
 

Reply via email to