Regenerate website

Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/e627b278
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/e627b278
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/e627b278

Branch: refs/heads/asf-site
Commit: e627b27880ea4b7159063de5f0eab1bdd59a511b
Parents: 2dd2c59
Author: Ahmet Altay <al...@google.com>
Authored: Fri Feb 10 12:05:21 2017 -0800
Committer: Ahmet Altay <al...@google.com>
Committed: Fri Feb 10 12:05:21 2017 -0800

----------------------------------------------------------------------
 .../python-pipeline-dependencies/index.html     | 316 +++++++++++++++++++
 content/documentation/sdks/python/index.html    |   3 +
 2 files changed, 319 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/e627b278/content/documentation/sdks/python-pipeline-dependencies/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/sdks/python-pipeline-dependencies/index.html 
b/content/documentation/sdks/python-pipeline-dependencies/index.html
new file mode 100644
index 0000000..4107f5d
--- /dev/null
+++ b/content/documentation/sdks/python-pipeline-dependencies/index.html
@@ -0,0 +1,316 @@
+<!DOCTYPE html>
+<html lang="en">
+
+  <head>
+  <meta charset="utf-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+
+  <title>Managing Python Pipeline Dependencies</title>
+  <meta name="description" content="Apache Beam is an open source, unified 
model and set of language-specific SDKs for defining and executing data 
processing workflows, and also data ingestion and integration flows, supporting 
Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs). 
Dataflow pipelines simplify the mechanics of large-scale batch and streaming 
data processing and can run on a number of runtimes like Apache Flink, Apache 
Spark, and Google Cloud Dataflow (a cloud service). Beam also brings DSL in 
different languages, allowing users to easily implement their data integration 
processes.
+">
+
+  <link rel="stylesheet" href="/styles/site.css">
+  <link rel="stylesheet" href="/css/theme.css">
+  <script 
src="https://ajax.googleapis.com/ajax/libs/jquery/2.2.0/jquery.min.js";></script>
+  <script src="/js/bootstrap.min.js"></script>
+  <script src="/js/language-switch.js"></script>
+  <link rel="canonical" 
href="https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/"; 
data-proofer-ignore>
+  <link rel="alternate" type="application/rss+xml" title="Apache Beam" 
href="https://beam.apache.org/feed.xml";>
+  <script>
+    
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+    
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+    
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+    ga('create', 'UA-73650088-1', 'auto');
+    ga('send', 'pageview');
+
+  </script>
+  <link rel="shortcut icon" type="image/x-icon" href="/images/favicon.ico">
+</head>
+
+
+  <body role="document">
+
+    <nav class="navbar navbar-default navbar-fixed-top">
+  <div class="container">
+    <div class="navbar-header">
+      <a href="/" class="navbar-brand" >
+        <img alt="Brand" style="height: 25px" 
src="/images/beam_logo_navbar.png">
+      </a>
+      <button type="button" class="navbar-toggle collapsed" 
data-toggle="collapse" data-target="#navbar" aria-expanded="false" 
aria-controls="navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+    </div>
+    <div id="navbar" class="navbar-collapse collapse">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+                 <a href="#" class="dropdown-toggle" data-toggle="dropdown" 
role="button" aria-haspopup="true" aria-expanded="false">Get Started <span 
class="caret"></span></a>
+                 <ul class="dropdown-menu">
+                         <li><a href="/get-started/beam-overview/">Beam 
Overview</a></li>
+        <li><a href="/get-started/quickstart-java/">Quickstart - Java</a></li>
+        <li><a href="/get-started/quickstart-py/">Quickstart - Python</a></li>
+                         <li role="separator" class="divider"></li>
+                         <li class="dropdown-header">Example Walkthroughs</li>
+                         <li><a 
href="/get-started/wordcount-example/">WordCount</a></li>
+                         <li><a 
href="/get-started/mobile-gaming-example/">Mobile Gaming</a></li>
+              <li role="separator" class="divider"></li>
+              <li class="dropdown-header">Resources</li>
+              <li><a href="/get-started/downloads">Downloads</a></li>
+              <li><a href="/get-started/support">Support</a></li>
+                 </ul>
+           </li>
+        <li class="dropdown">
+                 <a href="#" class="dropdown-toggle" data-toggle="dropdown" 
role="button" aria-haspopup="true" aria-expanded="false">Documentation <span 
class="caret"></span></a>
+                 <ul class="dropdown-menu">
+                         <li><a href="/documentation">Using the 
Documentation</a></li>
+                         <li role="separator" class="divider"></li>
+                         <li class="dropdown-header">Beam Concepts</li>
+                         <li><a 
href="/documentation/programming-guide/">Programming Guide</a></li>
+                         <li><a href="/documentation/resources/">Additional 
Resources</a></li>
+                         <li role="separator" class="divider"></li>
+              <li class="dropdown-header">Pipeline Fundamentals</li>
+              <li><a 
href="/documentation/pipelines/design-your-pipeline/">Design Your 
Pipeline</a></li>
+              <li><a 
href="/documentation/pipelines/create-your-pipeline/">Create Your 
Pipeline</a></li>
+              <li><a href="/documentation/pipelines/test-your-pipeline/">Test 
Your Pipeline</a></li>
+              <li role="separator" class="divider"></li>
+                         <li class="dropdown-header">SDKs</li>
+                         <li><a href="/documentation/sdks/java/">Java 
SDK</a></li>
+                         <li><a href="/documentation/sdks/javadoc/0.5.0/" 
target="_blank">Java SDK API Reference <img src="/images/external-link-icon.png"
+                 width="14" height="14"
+                 alt="External link."></a>
+        </li>
+        <li><a href="/documentation/sdks/python/">Python SDK</a></li>
+                         <li role="separator" class="divider"></li>
+                         <li class="dropdown-header">Runners</li>
+                         <li><a 
href="/documentation/runners/capability-matrix/">Capability Matrix</a></li>
+                         <li><a href="/documentation/runners/direct/">Direct 
Runner</a></li>
+                         <li><a href="/documentation/runners/apex/">Apache 
Apex Runner</a></li>
+                         <li><a href="/documentation/runners/flink/">Apache 
Flink Runner</a></li>
+                         <li><a href="/documentation/runners/spark/">Apache 
Spark Runner</a></li>
+                         <li><a href="/documentation/runners/dataflow/">Cloud 
Dataflow Runner</a></li>
+                 </ul>
+           </li>
+        <li class="dropdown">
+                 <a href="#" class="dropdown-toggle" data-toggle="dropdown" 
role="button" aria-haspopup="true" aria-expanded="false">Contribute <span 
class="caret"></span></a>
+                 <ul class="dropdown-menu">
+                         <li><a href="/contribute">Get Started 
Contributing</a></li>
+        <li role="separator" class="divider"></li>
+        <li class="dropdown-header">Guides</li>
+                         <li><a 
href="/contribute/contribution-guide/">Contribution Guide</a></li>
+        <li><a href="/contribute/testing/">Testing Guide</a></li>
+        <li><a href="/contribute/release-guide/">Release Guide</a></li>
+        <li><a href="/contribute/ptransform-style-guide/">PTransform Style 
Guide</a></li>
+        <li role="separator" class="divider"></li>
+        <li class="dropdown-header">Technical References</li>
+        <li><a href="/contribute/design-principles/">Design Principles</a></li>
+                         <li><a href="/contribute/work-in-progress/">Ongoing 
Projects</a></li>
+        <li><a href="/contribute/source-repository/">Source 
Repository</a></li>      
+        <li role="separator" class="divider"></li>
+                         <li class="dropdown-header">Promotion</li>
+        <li><a href="/contribute/presentation-materials/">Presentation 
Materials</a></li>
+        <li><a href="/contribute/logos/">Logos and Design</a></li>
+        <li role="separator" class="divider"></li>
+        <li><a href="/contribute/maturity-model/">Maturity Model</a></li>
+        <li><a href="/contribute/team/">Team</a></li>
+                 </ul>
+           </li>
+
+        <li><a href="/blog">Blog</a></li>
+      </ul>
+      <ul class="nav navbar-nav navbar-right">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown" 
role="button" aria-haspopup="true" aria-expanded="false"><img 
src="https://www.apache.org/foundation/press/kit/feather_small.png"; alt="Apache 
Logo" style="height:24px;">Apache Software Foundation<span 
class="caret"></span></a>
+          <ul class="dropdown-menu dropdown-menu-right">
+            <li><a href="http://www.apache.org/";>ASF Homepage</a></li>
+            <li><a href="http://www.apache.org/licenses/";>License</a></li>
+            <li><a href="http://www.apache.org/security/";>Security</a></li>
+            <li><a 
href="http://www.apache.org/foundation/thanks.html";>Thanks</a></li>
+            <li><a 
href="http://www.apache.org/foundation/sponsorship.html";>Sponsorship</a></li>
+            <li><a 
href="https://www.apache.org/foundation/policies/conduct";>Code of 
Conduct</a></li>
+          </ul>
+        </li>
+      </ul>
+    </div><!--/.nav-collapse -->
+  </div>
+</nav>
+
+
+<link rel="stylesheet" href="">
+
+
+    <div class="container" role="main">
+
+      <div class="row">
+        <h1 id="managing-python-pipeline-dependencies">Managing Python 
Pipeline Dependencies</h1>
+
+<blockquote>
+  <p><strong>Note:</strong> This page is only applicable to runners that do 
remote execution.</p>
+</blockquote>
+
+<p>When you run your pipeline locally, the packages that your pipeline depends 
on are available because they are installed on your local machine. However, 
when you want to run your pipeline remotely, you must make sure these 
dependencies are available on the remote machines. This tutorial shows you how 
to make your dependencies available to the remote workers. Each section below 
refers to a different source that your package may have been installed from.</p>
+
+<p><strong>Note:</strong> Remote workers used for pipeline execution typically 
have a standard Python 2.7 distribution installation. If your code relies only 
on standard Python packages, then you probably don’t need to do anything on 
this page.</p>
+
+<h2 id="a-namepypiapypi-dependencies"><a name="pypi"></a>PyPI Dependencies</h2>
+
+<p>If your pipeline uses public packages from the <a 
href="https://pypi.python.org/pypi";>Python Package Index</a>, make these 
packages available remotely by performing the following steps:</p>
+
+<p><strong>Note:</strong> If your PyPI package depends on a non-Python package 
(e.g. a package that requires installation on Linux using the <code 
class="highlighter-rouge">apt-get install</code> command), see the <a 
href="#nonpython">PyPI Dependencies with Non-Python Dependencies</a> section 
instead.</p>
+
+<ol>
+  <li>
+    <p>Find out which packages are installed on your machine. Run the 
following command:</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> pip freeze 
&gt; requirements.txt
+</code></pre>
+    </div>
+
+    <p>This command creates a <code 
class="highlighter-rouge">requirements.txt</code> file that lists all packages 
that are installed on your machine, regardless of where they were installed 
from.</p>
+  </li>
+  <li>
+    <p>Edit the <code class="highlighter-rouge">requirements.txt</code> file 
and leave only the packages that were installed from PyPI and are used in the 
workflow source. Delete all packages that are not relevant to your code.</p>
+  </li>
+  <li>
+    <p>Run your pipeline with the following command-line option:</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> 
--requirements_file requirements.txt
+</code></pre>
+    </div>
+
+    <p>The runner will use the <code 
class="highlighter-rouge">requirements.txt</code> file to install your 
additional dependencies onto the remote workers.</p>
+  </li>
+</ol>
+
+<p><strong>Important:</strong> Remote workers will install all packages listed 
in the <code class="highlighter-rouge">requirements.txt</code> file. Because of 
this, it’s very important that you delete non-PyPI packages from the <code 
class="highlighter-rouge">requirements.txt</code> file, as stated in step 2. If 
you don’t remove non-PyPI packages, the remote workers will fail when 
attempting to install packages from sources that are unknown to them.</p>
+
+<h2 id="a-namelocalnonpypialocal-or-non-pypi-dependencies"><a 
name="localnonpypi"></a>Local or non-PyPI Dependencies</h2>
+
+<p>If your pipeline uses packages that are not available publicly (e.g. 
packages that you’ve downloaded from a GitHub repo), make these packages 
available remotely by performing the following steps:</p>
+
+<ol>
+  <li>
+    <p>Identify which packages are installed on your machine and are not 
public. Run the following command:</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> pip freeze
+</code></pre>
+    </div>
+
+    <p>This command lists all packages that are installed on your machine, 
regardless of where they were installed from.</p>
+  </li>
+  <li>
+    <p>Run your pipeline with the following command-line option:</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> 
--extra_package /path/to/package/package-name
+</code></pre>
+    </div>
+  </li>
+</ol>
+
+<h2 id="a-namemultfilesamultiple-file-dependencies"><a 
name="multfiles"></a>Multiple File Dependencies</h2>
+
+<p>Often, your pipeline code spans multiple files. To run your project 
remotely, you must group these files as a Python package and specify the 
package when you run your pipeline. When the remote workers start, they will 
install your package. To group your files as a Python package and make it 
available remotely, perform the following steps:</p>
+
+<ol>
+  <li>
+    <p>Create a <a 
href="https://pythonhosted.org/an_example_pypi_project/setuptools.html";>setup.py</a>
 file for your project. The following is a very basic <code 
class="highlighter-rouge">setup.py</code> file.</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> 
setuptools.setup(
+    name='PACKAGE-NAME'
+    version='PACKAGE-VERSION',
+    install_requires=[],
+    packages=setuptools.find_packages(),
+ )
+</code></pre>
+    </div>
+  </li>
+  <li>
+    <p>Structure your project so that the root directory contains the <code 
class="highlighter-rouge">setup.py</code> file, the main workflow file, and a 
directory with the rest of the files.</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> root_dir/
+   setup.py
+   main.py
+   other_files_dir/
+</code></pre>
+    </div>
+
+    <p>See <a 
href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset";>Juliaset</a>
 for an example that follows this required project structure.</p>
+  </li>
+  <li>
+    <p>Run your pipeline with the following command-line option:</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> --setup_file 
/path/to/setup.py
+</code></pre>
+    </div>
+  </li>
+</ol>
+
+<p><strong>Note:</strong> If you <a href="#pypi">created a requirements.txt 
file</a> and your project spans multiple files, you can get rid of the <code 
class="highlighter-rouge">requirements.txt</code> file and instead, add all 
packages contained in <code class="highlighter-rouge">requirements.txt</code> 
to the <code class="highlighter-rouge">install_requires</code> field of the 
setup call (in step 1).</p>
+
+<h2 
id="a-namenonpythonanon-python-dependencies-or-pypi-dependencies-with-non-python-dependencies"><a
 name="nonpython"></a>Non-Python Dependencies or PyPI Dependencies with 
Non-Python Dependencies</h2>
+
+<p>If your pipeline uses non-Python packages (e.g. packages that require 
installation using the <code class="highlighter-rouge">apt-get install</code> 
command), or uses a PyPI package that depends on non-Python dependencies during 
package installation, you must perform the following steps.</p>
+
+<ol>
+  <li>
+    <p>Add the required installation commands (e.g. the <code 
class="highlighter-rouge">apt-get install</code> commands) for the non-Python 
dependencies to the list of <code 
class="highlighter-rouge">CUSTOM_COMMANDS</code> in your <code 
class="highlighter-rouge">setup.py</code> file. See the <a 
href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py";>Juliaset
 setup.py</a> for an example.</p>
+
+    <p><strong>Note:</strong> You must make sure that these commands are 
runnable on the remote worker (e.g. if you use <code 
class="highlighter-rouge">apt-get</code>, the remote worker needs <code 
class="highlighter-rouge">apt-get</code> support).</p>
+  </li>
+  <li>
+    <p>If you are using a PyPI package that depends on non-Python 
dependencies, add <code class="highlighter-rouge">['pip', 'install', '&lt;your 
PyPI package&gt;']</code> to the list of <code 
class="highlighter-rouge">CUSTOM_COMMANDS</code> in your <code 
class="highlighter-rouge">setup.py</code> file.</p>
+  </li>
+  <li>
+    <p>Structure your project so that the root directory contains the <code 
class="highlighter-rouge">setup.py</code> file, the main workflow file, and a 
directory with the rest of the files.</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> root_dir/
+   setup.py
+   main.py
+   other_files_dir/
+</code></pre>
+    </div>
+
+    <p>See the <a 
href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset";>Juliaset</a>
 project for an example that follows this required project structure.</p>
+  </li>
+  <li>
+    <p>Run your pipeline with the following command-line option:</p>
+
+    <div class="highlighter-rouge"><pre class="highlight"><code> --setup_file 
/path/to/setup.py
+</code></pre>
+    </div>
+  </li>
+</ol>
+
+<p><strong>Note:</strong> Because custom commands execute after the 
dependencies for your workflow are installed (by <code 
class="highlighter-rouge">pip</code>), you should omit the PyPI package 
dependency from the pipeline’s <code 
class="highlighter-rouge">requirements.txt</code> file and from the <code 
class="highlighter-rouge">install_requires</code> parameter in the <code 
class="highlighter-rouge">setuptools.setup()</code> call of your <code 
class="highlighter-rouge">setup.py</code> file.</p>
+
+
+      </div>
+
+
+    <hr>
+  <div class="row">
+      <div class="col-xs-12">
+          <footer>
+              <p class="text-center">
+                &copy; Copyright
+                <a href="http://www.apache.org";>The Apache Software 
Foundation</a>,
+                2017. All Rights Reserved.
+              </p>
+              <p class="text-center">
+                <a href="/privacy_policy">Privacy Policy</a> |
+                <a href="/feed.xml">RSS Feed</a>
+              </p>
+          </footer>
+      </div>
+  </div>
+  <!-- container div end -->
+</div>
+
+
+  </body>
+
+</html>

http://git-wip-us.apache.org/repos/asf/beam-site/blob/e627b278/content/documentation/sdks/python/index.html
----------------------------------------------------------------------
diff --git a/content/documentation/sdks/python/index.html 
b/content/documentation/sdks/python/index.html
index 3924ebe..6ae91b0 100644
--- a/content/documentation/sdks/python/index.html
+++ b/content/documentation/sdks/python/index.html
@@ -160,6 +160,9 @@
 
 <p>Python is a dynamically-typed language with no static type checking. The 
Beam SDK for Python uses type hints during pipeline construction and runtime to 
try to emulate the correctness guarantees achieved by true static typing. <a 
href="/documentation/sdks/python-type-safety">Ensuring Python Type Safety</a> 
walks through how to use type hints, which help you to catch potential bugs up 
front with the <a href="/documentation/runners/direct/">Direct Runner</a>.</p>
 
+<h2 id="managing-python-pipeline-dependencies">Managing Python Pipeline 
Dependencies</h2>
+
+<p>When you run your pipeline locally, the packages that your pipeline depends 
on are available because they are installed on your local machine. However, 
when you want to run your pipeline remotely, you must make sure these 
dependencies are available on the remote machines. <a 
href="/documentation/sdks/python-pipeline-dependencies">Managing Python 
Pipeline Dependencies</a> shows you how to make your dependencies available to 
the remote workers.</p>
 
       </div>
 

Reply via email to