This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push:
new bb4f46e Publishing website 2020/12/18 00:01:29 at commit 1542971
bb4f46e is described below
commit bb4f46e76b191ce55dd619f2bd079a1a3f05bffb
Author: jenkins <[email protected]>
AuthorDate: Fri Dec 18 00:01:29 2020 +0000
Publishing website 2020/12/18 00:01:29 at commit 1542971
---
website/generated-content/documentation/index.xml | 265 ++++++++++++++-------
.../io/built-in/google-bigquery/index.html | 4 +-
.../documentation/runtime/environments/index.html | 149 ++++++++----
website/generated-content/sitemap.xml | 2 +-
4 files changed, 285 insertions(+), 135 deletions(-)
diff --git a/website/generated-content/documentation/index.xml
b/website/generated-content/documentation/index.xml
index e45b820..a8a66a1 100644
--- a/website/generated-content/documentation/index.xml
+++ b/website/generated-content/documentation/index.xml
@@ -10024,114 +10024,201 @@ See the License for the specific language governing
permissions and
limitations under the License.
-->
<h1 id="container-environments">Container environments</h1>
-<p>The Beam SDK runtime environment is isolated from other runtime systems
because the SDK runtime environment is <a
href="https://s.apache.org/beam-fn-api-container-contract">containerized</a>
with <a href="https://www.docker.com/">Docker</a>. This means that any
execution engine can run the Beam SDK.</p>
-<p>This page describes how to customize, build, and push Beam SDK container
images.</p>
-<p>Before you begin, install <a
href="https://www.docker.com/">Docker</a> on your workstation.</p>
-<h2 id="customizing-container-images">Customizing container images</h2>
-<p>You can add extra dependencies to container images so that you
don&rsquo;t have to supply the dependencies to execution engines.</p>
-<p>To customize a container image, either:</p>
-<ul>
-<li><a href="#writing-new-dockerfiles">Write a new</a> <a
href="https://docs.docker.com/engine/reference/builder/">Dockerfile</a> on
top of the original.</li>
-<li><a href="#modifying-dockerfiles">Modify</a> the <a
href="https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile">original
Dockerfile</a> and reimage the container.</li>
-</ul>
-<p>It&rsquo;s often easier to write a new Dockerfile. However, by
modifying the original Dockerfile, you can customize anything (including the
base OS).</p>
-<h3 id="writing-new-dockerfiles">Writing new Dockerfiles on top of the
original</h3>
+<p>The Beam SDK runtime environment can be <a
href="https://www.docker.com/resources/what-container">containerized</a>
with <a href="https://www.docker.com/">Docker</a> to isolate it from
other runtime systems. To learn more about the container environment, read the
Beam <a href="https://s.apache.org/beam-fn-api-container-contract">SDK
Harness container contract</a>.</p>
+<p>Prebuilt SDK container images are released per supported language during
Beam releases and pushed to <a
href="https://hub.docker.com/search?q=apache%2Fbeam&amp;type=image">Docker
Hub</a>.</p>
+<h2 id="custom-containers">Custom containers</h2>
+<p>You may want to customize container images for many reasons,
including:</p>
+<ul>
+<li>Pre-installing additional dependencies</li>
+<li>Launching third-party software in the worker environment</li>
+<li>Further customizing the execution environment</li>
+</ul>
+<p>This guide describes how to create and use customized containers for the
Beam SDK.</p>
+<h3 id="prerequisites">Prerequisites</h3>
+<ul>
+<li>This guide requires building images using Docker. <a
href="https://docs.docker.com/get-docker/">Install Docker locally</a>. Some
CI/CD platforms like <a
href="https://cloud.google.com/cloud-build/docs/building/build-containers">Google
Cloud Build</a> also provide the ability to build images using
Docker.</li>
+<li>For remote execution engines/runners, have a container registry to host
your custom container image. Options include <a
href="https://hub.docker.com/">Docker Hub</a> or a
&ldquo;self-hosted&rdquo; repository, including cloud-specific
container registries like <a
href="https://cloud.google.com/container-registry">Google Container
Registry</a> (GCR) or <a href="https://aws.amazon.com/ecr/">Amazon
Elastic Container Registry</a> (ECR). Make sure your registry [...]
+</ul>
+<blockquote>
+<p><strong>NOTE</strong>: On Nov 20, 2020, Docker Hub put <a
href="https://www.docker.com/increase-rate-limits">rate limits</a> into
effect for anonymous and free authenticated use, which may impact larger
pipelines that pull containers several times.</p>
+</blockquote>
+<p>For optimal user experience, we also recommend you use the latest
released version of Beam.</p>
+<h3 id="building-and-pushing-custom-containers">Building and pushing custom
containers</h3>
+<p>Beam <a
href="https://hub.docker.com/search?q=apache%2Fbeam&amp;type=image">SDK
container images</a> are built from Dockerfiles checked into the <a
href="https://github.com/apache/beam">Github</a> repository and published to
Docker Hub for every release. You can build customized containers in one of two
ways:</p>
<ol>
-<li>Pull a <a
href="https://hub.docker.com/search?q=apache%2Fbeam&amp;type=image">prebuilt
SDK container image</a> for your <a
href="https://docs.docker.com/docker-hub/repos/#searching-for-repositories">target</a>
language and version. The following example pulls the latest Python
SDK:</li>
+<li><strong><a href="#writing-new-dockerfiles">Writing a new</a>
Dockerfile based on a released container image</strong>. This is sufficient
for simple additions to the image, such as adding artifacts or environment
variables.</li>
+<li><strong><a href="#modifying-dockerfiles">Modifying</a> a
source Dockerfile in <a
href="https://github.com/apache/beam">Beam</a></strong>. This method
requires building from Beam source but allows for greater customization of the
container (including replacement of artifacts or base OS/language
versions).</li>
</ol>
-<pre><code>docker pull apache/beam_python3.7_sdk
-</code></pre><ol start="2">
-<li><a
href="https://docs.docker.com/develop/develop-images/dockerfile_best-practices/">Write
a new Dockerfile</a> that <a
href="https://docs.docker.com/engine/reference/builder/#from">designates</a>
the original as its <a
href="https://docs.docker.com/glossary/?term=parent%20image">parent</a>.</li>
-<li><a href="#building-container-images">Build</a> a child
image.</li>
+<h4 id="writing-new-dockerfiles">Writing a new Dockerfile based on an
existing published container image</h4>
+<ol>
+<li>Create a new Dockerfile that designates a base image using the <a
href="https://docs.docker.com/engine/reference/builder/#from">FROM
instruction</a>.</li>
</ol>
-<h3 id="modifying-dockerfiles">Modifying the original Dockerfile</h3>
+<pre><code>FROM apache/beam_python3.7_sdk:2.25.0
+ENV FOO=bar
+COPY /src/path/to/file /dest/path/to/file/
+</code></pre><p>This <code>Dockerfile</code> uses the prebuilt
Python 3.7 SDK container image <a
href="https://hub.docker.com/r/apache/beam_python3.7_sdk"><code>beam_python3.7_sdk</code></a>
tagged at (SDK version) <code>2.25.0</code>, and adds an additional
environment variable and file to the image.</p>
+<ol start="2">
+<li><a
href="https://docs.docker.com/engine/reference/commandline/build/">Build</a>
and <a
href="https://docs.docker.com/engine/reference/commandline/push/">push</a>
the image using Docker.</li>
+</ol>
+<pre><code>export
BASE_IMAGE=&quot;apache/beam_python3.7_sdk:2.25.0&quot;
+export IMAGE_NAME=&quot;myremoterepo/mybeamsdk&quot;
+export TAG=&quot;latest&quot;
+# Optional - pull the base image into your local Docker daemon to ensure
+# you have the most up-to-date version of the base image locally.
+docker pull &quot;${BASE_IMAGE}&quot;
+docker build -f Dockerfile -t &quot;${IMAGE_NAME}:${TAG}&quot; .
+</code></pre><ol start="3">
+<li>If your runner is running remotely, retag and <a
href="https://docs.docker.com/engine/reference/commandline/push/">push</a>
the image to the appropriate repository.</li>
+</ol>
+<pre><code>docker push &quot;${IMAGE_NAME}:${TAG}&quot;
+</code></pre><ol start="4">
+<li>After pushing a container image, verify the remote image ID and digest
matches the local image ID and digest, output from <code>docker
build</code> or <code>docker images</code>.</li>
+</ol>
+<h4 id="modifying-dockerfiles">Modifying a source Dockerfile in Beam</h4>
+<p>This method requires building image artifacts from Beam source. For
additional instructions on setting up your development environment, see the
<a href="/contribute/#development-setup">Contribution guide</a>.</p>
+<blockquote>
+<p><strong>NOTE</strong>: It is recommended that you start from a
stable release branch (<code>release-X.XX.X</code>) corresponding to the
same version of the SDK to run your pipeline. Differences in SDK version may
result in unexpected errors.</p>
+</blockquote>
<ol>
-<li>Clone the <code>beam</code> repository:</li>
+<li>Clone the <code>beam</code> repository.</li>
</ol>
-<pre><code>git clone https://github.com/apache/beam.git
+<pre><code>export BEAM_SDK_VERSION=&quot;2.26.0&quot;
+git clone https://github.com/apache/beam.git
+cd beam
+# Save current directory as working directory
+export BEAM_WORKDIR=$PWD
+git checkout origin/release-$BEAM_SDK_VERSION
</code></pre><ol start="2">
-<li>Customize the <a
href="https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile">Dockerfile</a>.
If you&rsquo;re adding dependencies from <a
href="https://pypi.org/">PyPI</a>, use <a
href="https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt"><code>base_image_requirements.txt</code></a>
instead.</li>
-<li><a href="#building-container-images">Reimage</a> the
container.</li>
+<li>
+<p>Customize the <code>Dockerfile</code> for a given language,
typically <code>sdks/&lt;language&gt;/container/Dockerfile</code>
directory (e.g. the <a
href="https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile">Dockerfile
for Python</a>. If you&rsquo;re adding dependencies from <a
href="https://pypi.org/">PyPI</a>, use <a
href="https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt">&l
[...]
+</li>
+<li>
+<p>Return to the root Beam directory and run the Gradle
<code>docker</code> target for your image.</p>
+</li>
+</ol>
+<pre><code>cd $BEAM_WORKDIR
+# The default repository of each SDK
+./gradlew :sdks:java:container:java8:docker
+./gradlew :sdks:java:container:java11:docker
+./gradlew :sdks:go:container:docker
+./gradlew :sdks:python:container:py36:docker
+./gradlew :sdks:python:container:py37:docker
+./gradlew :sdks:python:container:py38:docker
+# Shortcut for building all Python SDKs
+./gradlew :sdks:python:container buildAll
+</code></pre><ol start="4">
+<li>Verify the images you built were created by running <code>docker
images</code>.</li>
+</ol>
+<pre><code>$&gt; docker images --digests
+REPOSITORY TAG DIGEST IMAGE ID CREATED SIZE
+apache/beam_java8_sdk latest sha256:... ... 1 min ago ...
+apache/beam_java11_sdk latest sha256:... ... 1 min ago ...
+apache/beam_python3.6_sdk latest sha256:... ... 1 min ago ...
+apache/beam_python3.7_sdk latest sha256:... ... 1 min ago ...
+apache/beam_python3.8_sdk latest sha256:... ... 1 min ago ...
+apache/beam_go_sdk latest sha256:... ... 1 min ago ...
+</code></pre><ol start="5">
+<li>If your runner is running remotely, retag the image and <a
href="https://docs.docker.com/engine/reference/commandline/push/">push</a>
the image to your repository. You can skip this step if you provide a custom
repo/tag as <a href="#additional-build-parameters">additional
parameters</a>.</li>
</ol>
-<h3 id="testing-customized-images">Testing customized images</h3>
-<p>To test a customized image locally, run a pipeline with PortableRunner
and set the <code>--environment_config</code> flag to the image
path:</p>
+<pre><code>export BEAM_SDK_VERSION=&quot;2.26.0&quot;
+export IMAGE_NAME=&quot;gcr.io/my-gcp-project/beam_python3.7_sdk&quot;
+export TAG=&quot;${BEAM_SDK_VERSION}-custom&quot;
+docker tag apache/beam_python3.7_sdk &quot;${IMAGE_NAME}:${TAG}&quot;
+docker push &quot;${IMAGE_NAME}:${TAG}&quot;
+</code></pre><ol start="6">
+<li>After pushing a container image, verify the remote image ID and digest
matches the local image ID and digest output from <code>docker_images
--digests</code>.</li>
+</ol>
+<h4 id="additional-build-parameters">Additional build parameters</h4>
+<p>The docker Gradle task defines a default image repository and <a
href="https://docs.docker.com/engine/reference/commandline/tag/">tag</a> is
the SDK version defined at <a
href="https://github.com/apache/beam/blob/master/gradle.properties">gradle.properties</a>.
The default repository is the Docker Hub <code>apache</code> namespace,
and the default tag is the <a
href="https://github.com/apache/beam/blob/master/gradle.properties">SDK
version</a> defined at gra [...]
+<p>You can specify a different repository or tag for built images by
providing parameters to the build task. For example:</p>
+<pre><code>./gradlew :sdks:python:container:py36:docker
-Pdocker-repository-root=&quot;example-repo&quot;
-Pdocker-tag=&quot;2.26.0-custom&quot;
+</code></pre><p>builds the Python 3.6 container and tags it as
<code>example-repo/beam_python3.6_sdk:2.26.0-custom</code>.</p>
+<p>From Beam 2.21.0 and later, a <code>docker-pull-licenses</code>
flag was introduced to add licenses/notices for third party dependencies to the
docker images. For example:</p>
+<pre><code>./gradlew :sdks:java:container:java8:docker
-Pdocker-pull-licenses
+</code></pre><p>creates a Java 8 SDK image with appropriate licenses
in <code>/opt/apache/beam/third_party_licenses/</code>.</p>
+<p>By default, no licenses/notices are added to the docker images.</p>
+<h2 id="running-pipelines">Running pipelines with custom container
images</h2>
+<p>The common method for providing a container image requires using the
+PortableRunner flag <code>--environment_config</code> as supported by
the Portable
+Runner or by runners supported PortableRunner flags.
+Other runners, such as Dataflow, support specifying containers with different
flags.</p>
<div class=runner-direct>
-<pre><code>python -m apache_beam.examples.wordcount \
+<pre><code>export IMAGE=&#34;my-repo/beam_python_sdk_custom&#34;
+export TAG=&#34;X.Y.Z&#34;
+export IMAGE_URL = &#34;${IMAGE}:${TAG}&#34;
+python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
--output /path/to/write/counts \
--runner=PortableRunner \
--job_endpoint=embed \
---environment_config=path/to/container/image</code></pre>
+--environment_type=&#34;DOCKER&#34; \
+--environment_config=&#34;${IMAGE_URL}&#34;</code></pre>
</div>
<div class=runner-flink-local>
-<pre><code># Start a Flink job server on localhost:8099
-./gradlew :runners:flink:1.8:job-server:runShadow
-# Run a pipeline on the Flink job server
+<pre><code>export IMAGE=&#34;my-repo/beam_python_sdk_custom&#34;
+export TAG=&#34;X.Y.Z&#34;
+export IMAGE_URL = &#34;${IMAGE}:${TAG}&#34;
+# Run a pipeline using the FlinkRunner which starts a Flink job server.
python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
---output=/path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image</code></pre>
+--output=path/to/write/counts \
+--runner=FlinkRunner \
+--environment_type=&#34;DOCKER&#34; \
+--environment_config=&#34;${IMAGE_URL}&#34;</code></pre>
</div>
<div class=runner-spark-local>
-<pre><code># Start a Spark job server on localhost:8099
-./gradlew :runners:spark:job-server:runShadow
-# Run a pipeline on the Spark job server
+<pre><code>export IMAGE=&#34;my-repo/beam_python_sdk_custom&#34;
+export TAG=&#34;X.Y.Z&#34;
+export IMAGE_URL = &#34;${IMAGE}:${TAG}&#34;
+# Run a pipeline using the SparkRunner which starts the Spark job server
python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
--output=path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image</code></pre>
-</div>
-<h2 id="building-container-images">Building container images</h2>
-<p>To build Beam SDK container images:</p>
-<ol>
-<li>Navigate to the root directory of the local copy of your Apache
Beam.</li>
-<li>Run Gradle with the <code>docker</code> target. If
you&rsquo;re <a href="#writing-new-dockerfiles">building a child
image</a>, set the optional <code>--file</code> flag to the new
Dockerfile. If you&rsquo;re <a href="#modifying-dockerfiles">building an
image from an original Dockerfile</a>, ignore the <code>--file</code>
flag:</li>
-</ol>
-<pre><code># The default repository of each SDK
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java8:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java11:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:go:container:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py2:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py35:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py36:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py37:docker
-# Shortcut for building all four Python SDKs
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container buildAll
-</code></pre><p>From 2.21.0, <code>docker-pull-licenses</code>
tag was introduced. Licenses/notices of third party dependencies will be added
to the docker images when <code>docker-pull-licenses</code> was set.
-For example, <code>./gradlew :sdks:java:container:java8:docker
-Pdocker-pull-licenses</code>. The files are added to
<code>/opt/apache/beam/third_party_licenses/</code>.
-By default, no licenses/notices are added to the docker images.</p>
-<p>To examine the containers that you built, run <code>docker
images</code> from anywhere in the command line. If you successfully built
all of the container images, the command prints a table like the
following:</p>
-<pre><code>REPOSITORY TAG IMAGE ID CREATED SIZE
-apache/beam_java8_sdk latest ... 2 weeks ago ...
-apache/beam_java11_sdk latest ... 2 weeks ago ...
-apache/beam_python2.7_sdk latest ... 2 weeks ago ...
-apache/beam_python3.5_sdk latest ... 2 weeks ago ...
-apache/beam_python3.6_sdk latest ... 2 weeks ago ...
-apache/beam_python3.7_sdk latest ... 2 weeks ago ...
-apache/beam_go_sdk latest ... 2 weeks ago ...
-</code></pre><h3 id="overriding-default-docker-targets">Overriding
default Docker targets</h3>
-<p>The default <a
href="https://docs.docker.com/engine/reference/commandline/tag/">tag</a> is
sdk_version defined at <a
href="https://github.com/apache/beam/blob/master/gradle.properties">gradle.properties</a>
and the default repositories are in the Docker Hub <code>apache</code>
namespace.
-The <code>docker</code> command-line tool implicitly <a
href="#pushing-container-images">pushes container images</a> to this
location.</p>
-<p>To tag a local image, set the <code>docker-tag</code> option when
building the container. The following command tags a Python SDK image with a
date.</p>
-<pre><code>./gradlew :sdks:python:container:py36:docker
-Pdocker-tag=2019-10-04
-</code></pre><p>To change the repository, set the
<code>docker-repository-root</code> option to a new location. The
following command sets the <code>docker-repository-root</code>
-to a repository named <code>example-repo</code> on Docker Hub.</p>
-<pre><code>./gradlew :sdks:python:container:py36:docker
-Pdocker-repository-root=example-repo
-</code></pre><h2 id="pushing-container-images">Pushing container
images</h2>
-<p>After <a href="#building-container-images">building a container
image</a>, you can store it in a remote Docker repository.</p>
-<p>The following steps push a Python3.6 SDK image to the <a
href="#overriding-default-docker-targets"><code>docker-root-repository</code>
value</a>.
-Please log in to the destination repository as needed.</p>
-<p>Upload it to the remote repository:</p>
-<pre><code>docker push example-repo/beam_python3.6_sdk
-</code></pre><p>To download the image again, run <code>docker
pull</code>:</p>
-<pre><code>docker pull example-repo/beam_python3.6_sdk
-</code></pre><blockquote>
-<p><strong>Note</strong>: After pushing a container image, the remote
image ID and digest match the local image ID and digest.</p>
-</blockquote></description></item><item><title>Documentation:
Count</title><link>/documentation/transforms/java/aggregation/count/</link><pubDate>Mon,
01 Jan 0001 00:00:00
+0000</pubDate><guid>/documentation/transforms/java/aggregation/count/</guid><description>
+--runner=SparkRunner \
+--environment_type=&#34;DOCKER&#34; \
+--environment_config=&#34;${IMAGE_URL}&#34;</code></pre>
+</div>
+<div class=runner-dataflow>
+<pre><code>export GCS_PATH=&#34;gs://my-gcs-bucket&#34;
+export GCP_PROJECT=&#34;my-gcp-project&#34;
+export REGION=&#34;us-central1&#34;
+# By default, the Dataflow runner has access to the GCR images
+# under the same project.
+export IMAGE=&#34;my-repo/beam_python_sdk_custom&#34;
+export TAG=&#34;X.Y.Z&#34;
+export IMAGE_URL = &#34;${IMAGE}:${TAG}&#34;
+# Run a pipeline on Dataflow.
+# This is a Python batch pipeline, so to run on Dataflow Runner V2
+# you must specify the experiment &#34;use_runner_v2&#34;
+python -m apache_beam.examples.wordcount \
+--input gs://dataflow-samples/shakespeare/kinglear.txt \
+--output &#34;${GCS_PATH}/counts&#34; \
+--runner DataflowRunner \
+--project $GCP_PROJECT \
+--region $REGION \
+--temp_location &#34;${GCS_PATH}/tmp/&#34; \
+--experiment=use_runner_v2 \
+--worker_harness_container_image=$IMAGE_URL</code></pre>
+</div>
+<h3 id="troubleshooting">Troubleshooting</h3>
+<p>The following section describes some common issues to consider
+when you encounter unexpected errors running Beam pipelines with
+custom containers.</p>
+<ul>
+<li>Differences in language and SDK version between the container SDK and
+pipeline SDK may result in unexpected errors due to incompatibility. For best
+results, make sure to use the same stable SDK version for your base container
+and when running your pipeline.</li>
+<li>If you are running into unexpected errors when using remote containers,
+make sure that your container exists in the remote repository and can be
+accessed by any third-party service, if needed.</li>
+<li>Local runners attempt to pull remote images and default to local
+images. If an image cannot be pulled locally (by the docker daemon),
+you may see an log message like:
+<pre><code>Error response from daemon: manifest for
remote.repo/beam_python3.7_sdk:2.25.0-custom not found: manifest unknown: ...
+INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Unable to
pull image...
+</code></pre></li>
+</ul></description></item><item><title>Documentation:
Count</title><link>/documentation/transforms/java/aggregation/count/</link><pubDate>Mon,
01 Jan 0001 00:00:00
+0000</pubDate><guid>/documentation/transforms/java/aggregation/count/</guid><description>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
@@ -12470,10 +12557,10 @@ allows you to directly access tables in BigQuery
storage, and supports features
such as column selection and predicate filter push-down which can allow more
efficient pipeline execution.</p>
<p>The Beam SDK for Java supports using the BigQuery Storage API when
reading from
-BigQuery. SDK versions before 2.24.0 support the BigQuery Storage API as an
+BigQuery. SDK versions before 2.25.0 support the BigQuery Storage API as an
<a
href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/annotations/Experimental.html">experimental
feature</a>
and use the pre-GA BigQuery Storage API surface. Callers should migrate
-pipelines which use the BigQuery Storage API to use SDK version 2.24.0 or
later.</p>
+pipelines which use the BigQuery Storage API to use SDK version 2.25.0 or
later.</p>
<p>The Beam SDK for Python does not support the BigQuery Storage API. See
<a
href="https://issues.apache.org/jira/browse/BEAM-10917">BEAM-10917</a>).</p>
<h4 id="updating-your-code">Updating your code</h4>
diff --git
a/website/generated-content/documentation/io/built-in/google-bigquery/index.html
b/website/generated-content/documentation/io/built-in/google-bigquery/index.html
index 7a15a54..6b6b767 100644
---
a/website/generated-content/documentation/io/built-in/google-bigquery/index.html
+++
b/website/generated-content/documentation/io/built-in/google-bigquery/index.html
@@ -247,10 +247,10 @@ in the following example:</p><div
class=language-java><div class=highlight><pre
allows you to directly access tables in BigQuery storage, and supports features
such as column selection and predicate filter push-down which can allow more
efficient pipeline execution.</p><p>The Beam SDK for Java supports using the
BigQuery Storage API when reading from
-BigQuery. SDK versions before 2.24.0 support the BigQuery Storage API as an
+BigQuery. SDK versions before 2.25.0 support the BigQuery Storage API as an
<a
href=https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/annotations/Experimental.html>experimental
feature</a>
and use the pre-GA BigQuery Storage API surface. Callers should migrate
-pipelines which use the BigQuery Storage API to use SDK version 2.24.0 or
later.</p><p>The Beam SDK for Python does not support the BigQuery Storage API.
See
+pipelines which use the BigQuery Storage API to use SDK version 2.25.0 or
later.</p><p>The Beam SDK for Python does not support the BigQuery Storage API.
See
<a
href=https://issues.apache.org/jira/browse/BEAM-10917>BEAM-10917</a>).</p><h4
id=updating-your-code>Updating your code</h4><p>Use the following methods when
you read from a table:</p><ul><li>Required: Specify <a
href=https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.TypedRead.html#withMethod-org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method->withMethod(Method.DIRECT_READ)</a>
to use the BigQuery Storage API for the read opera [...]
example</a>.
When the example’s read method option is set to
<code>DIRECT_READ</code>, the pipeline uses
diff --git
a/website/generated-content/documentation/runtime/environments/index.html
b/website/generated-content/documentation/runtime/environments/index.html
index 7dc909d..3f35f3f 100644
--- a/website/generated-content/documentation/runtime/environments/index.html
+++ b/website/generated-content/documentation/runtime/environments/index.html
@@ -1,60 +1,123 @@
<!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta
http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport
content="width=device-width,initial-scale=1"><title>Container
environments</title><meta name=description content="Apache Beam is an open
source, unified model and set of language-specific SDKs for defining and
executing data processing workflows, and also data ingestion and integration
flows, supporting Enterprise Integration Patterns (EIPs) and Domain [...]
<span class=sr-only>Toggle navigation</span>
<span class=icon-bar></span><span class=icon-bar></span><span
class=icon-bar></span></button>
-<a href=/ class=navbar-brand><img alt=Brand style=height:25px
src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask
closed"></div><div id=navbar class="navbar-container closed"><ul class="nav
navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a
href=/documentation/>Documentation</a></li><li><a
href=/documentation/sdks/java/>Languages</a></li><li><a
href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a
href=/roadmap/>Roadmap</a></li>< [...]
-</code></pre><ol start=2><li><a
href=https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>Write
a new Dockerfile</a> that <a
href=https://docs.docker.com/engine/reference/builder/#from>designates</a> the
original as its <a
href="https://docs.docker.com/glossary/?term=parent%20image">parent</a>.</li><li><a
href=#building-container-images>Build</a> a child image.</li></ol><h3
id=modifying-dockerfiles>Modifying the original Dockerfile</h3><ol><li>Clone
the <code>beam</c [...]
-</code></pre><ol start=2><li>Customize the <a
href=https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile>Dockerfile</a>.
If you’re adding dependencies from <a href=https://pypi.org/>PyPI</a>,
use <a
href=https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt><code>base_image_requirements.txt</code></a>
instead.</li><li><a href=#building-container-images>Reimage</a> the
container.</li></ol><h3 id=testing-customized-images>T [...]
+<a href=/ class=navbar-brand><img alt=Brand style=height:25px
src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask
closed"></div><div id=navbar class="navbar-container closed"><ul class="nav
navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a
href=/documentation/>Documentation</a></li><li><a
href=/documentation/sdks/java/>Languages</a></li><li><a
href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a
href=/roadmap/>Roadmap</a></li>< [...]
+
+ENV FOO=bar
+COPY /src/path/to/file /dest/path/to/file/
+</code></pre><p>This <code>Dockerfile</code> uses the prebuilt Python 3.7 SDK
container image <a
href=https://hub.docker.com/r/apache/beam_python3.7_sdk><code>beam_python3.7_sdk</code></a>
tagged at (SDK version) <code>2.25.0</code>, and adds an additional
environment variable and file to the image.</p><ol start=2><li><a
href=https://docs.docker.com/engine/reference/commandline/build/>Build</a> and
<a href=https://docs.docker.com/engine/reference/commandline/push/>push</a> the
image usin [...]
+export IMAGE_NAME="myremoterepo/mybeamsdk"
+export TAG="latest"
+
+# Optional - pull the base image into your local Docker daemon to ensure
+# you have the most up-to-date version of the base image locally.
+docker pull "${BASE_IMAGE}"
+
+docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" .
+</code></pre><ol start=3><li>If your runner is running remotely, retag and <a
href=https://docs.docker.com/engine/reference/commandline/push/>push</a> the
image to the appropriate repository.</li></ol><pre><code>docker push
"${IMAGE_NAME}:${TAG}"
+</code></pre><ol start=4><li>After pushing a container image, verify the
remote image ID and digest matches the local image ID and digest, output from
<code>docker build</code> or <code>docker images</code>.</li></ol><h4
id=modifying-dockerfiles>Modifying a source Dockerfile in Beam</h4><p>This
method requires building image artifacts from Beam source. For additional
instructions on setting up your development environment, see the <a
href=/contribute/#development-setup>Contribution guide [...]
+git clone https://github.com/apache/beam.git
+cd beam
+
+# Save current directory as working directory
+export BEAM_WORKDIR=$PWD
+
+git checkout origin/release-$BEAM_SDK_VERSION
+</code></pre><ol start=2><li><p>Customize the <code>Dockerfile</code> for a
given language, typically <code>sdks/<language>/container/Dockerfile</code>
directory (e.g. the <a
href=https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile>Dockerfile
for Python</a>. If you’re adding dependencies from <a
href=https://pypi.org/>PyPI</a>, use <a
href=https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt><code>base_image_require
[...]
+
+# The default repository of each SDK
+./gradlew :sdks:java:container:java8:docker
+./gradlew :sdks:java:container:java11:docker
+./gradlew :sdks:go:container:docker
+./gradlew :sdks:python:container:py36:docker
+./gradlew :sdks:python:container:py37:docker
+./gradlew :sdks:python:container:py38:docker
+
+# Shortcut for building all Python SDKs
+./gradlew :sdks:python:container buildAll
+</code></pre><ol start=4><li>Verify the images you built were created by
running <code>docker images</code>.</li></ol><pre><code>$> docker images
--digests
+REPOSITORY TAG DIGEST
IMAGE ID CREATED SIZE
+apache/beam_java8_sdk latest sha256:...
... 1 min ago ...
+apache/beam_java11_sdk latest sha256:...
... 1 min ago ...
+apache/beam_python3.6_sdk latest sha256:...
... 1 min ago ...
+apache/beam_python3.7_sdk latest sha256:...
... 1 min ago ...
+apache/beam_python3.8_sdk latest sha256:...
... 1 min ago ...
+apache/beam_go_sdk latest sha256:...
... 1 min ago ...
+</code></pre><ol start=5><li>If your runner is running remotely, retag the
image and <a
href=https://docs.docker.com/engine/reference/commandline/push/>push</a> the
image to your repository. You can skip this step if you provide a custom
repo/tag as <a href=#additional-build-parameters>additional
parameters</a>.</li></ol><pre><code>export BEAM_SDK_VERSION="2.26.0"
+export IMAGE_NAME="gcr.io/my-gcp-project/beam_python3.7_sdk"
+export TAG="${BEAM_SDK_VERSION}-custom"
+
+docker tag apache/beam_python3.7_sdk "${IMAGE_NAME}:${TAG}"
+docker push "${IMAGE_NAME}:${TAG}"
+</code></pre><ol start=6><li>After pushing a container image, verify the
remote image ID and digest matches the local image ID and digest output from
<code>docker_images --digests</code>.</li></ol><h4
id=additional-build-parameters>Additional build parameters</h4><p>The docker
Gradle task defines a default image repository and <a
href=https://docs.docker.com/engine/reference/commandline/tag/>tag</a> is the
SDK version defined at <a
href=https://github.com/apache/beam/blob/master/gradle.p [...]
+</code></pre><p>builds the Python 3.6 container and tags it as
<code>example-repo/beam_python3.6_sdk:2.26.0-custom</code>.</p><p>From Beam
2.21.0 and later, a <code>docker-pull-licenses</code> flag was introduced to
add licenses/notices for third party dependencies to the docker images. For
example:</p><pre><code>./gradlew :sdks:java:container:java8:docker
-Pdocker-pull-licenses
+</code></pre><p>creates a Java 8 SDK image with appropriate licenses in
<code>/opt/apache/beam/third_party_licenses/</code>.</p><p>By default, no
licenses/notices are added to the docker images.</p><h2
id=running-pipelines>Running pipelines with custom container images</h2><p>The
common method for providing a container image requires using the
+PortableRunner flag <code>--environment_config</code> as supported by the
Portable
+Runner or by runners supported PortableRunner flags.
+Other runners, such as Dataflow, support specifying containers with different
flags.</p><div class=runner-direct><pre><code>export
IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
+
+python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
--output /path/to/write/counts \
--runner=PortableRunner \
--job_endpoint=embed \
---environment_config=path/to/container/image</code></pre></div><div
class=runner-flink-local><pre><code># Start a Flink job server on localhost:8099
-./gradlew :runners:flink:1.8:job-server:runShadow
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"</code></pre></div><div
class=runner-flink-local><pre><code>export
IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
-# Run a pipeline on the Flink job server
+# Run a pipeline using the FlinkRunner which starts a Flink job server.
python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
---output=/path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image</code></pre></div><div
class=runner-spark-local><pre><code># Start a Spark job server on localhost:8099
-./gradlew :runners:spark:job-server:runShadow
+--output=path/to/write/counts \
+--runner=FlinkRunner \
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"</code></pre></div><div
class=runner-spark-local><pre><code>export
IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
-# Run a pipeline on the Spark job server
+# Run a pipeline using the SparkRunner which starts the Spark job server
python -m apache_beam.examples.wordcount \
--input=/path/to/inputfile \
--output=path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image</code></pre></div><h2
id=building-container-images>Building container images</h2><p>To build Beam SDK
container images:</p><ol><li>Navigate to the root directory of the local copy
of your Apache Beam.</li><li>Run Gradle with the <code>docker</code> target. If
you’re <a href=#writing-new-dockerfiles>building a child image</a>, set
the optional <code>--file</code> flag to the new Dockerfile. If you’re <a
href=#modifying-dockerfiles>b [...]
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java8:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java11:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:go:container:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py2:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py35:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py36:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py37:docker
-
-# Shortcut for building all four Python SDKs
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container buildAll
-</code></pre><p>From 2.21.0, <code>docker-pull-licenses</code> tag was
introduced. Licenses/notices of third party dependencies will be added to the
docker images when <code>docker-pull-licenses</code> was set.
-For example, <code>./gradlew :sdks:java:container:java8:docker
-Pdocker-pull-licenses</code>. The files are added to
<code>/opt/apache/beam/third_party_licenses/</code>.
-By default, no licenses/notices are added to the docker images.</p><p>To
examine the containers that you built, run <code>docker images</code> from
anywhere in the command line. If you successfully built all of the container
images, the command prints a table like the following:</p><pre><code>REPOSITORY
TAG IMAGE ID CREATED
SIZE
-apache/beam_java8_sdk latest ... 2
weeks ago ...
-apache/beam_java11_sdk latest ... 2
weeks ago ...
-apache/beam_python2.7_sdk latest ... 2
weeks ago ...
-apache/beam_python3.5_sdk latest ... 2
weeks ago ...
-apache/beam_python3.6_sdk latest ... 2
weeks ago ...
-apache/beam_python3.7_sdk latest ... 2
weeks ago ...
-apache/beam_go_sdk latest ... 2
weeks ago ...
-</code></pre><h3 id=overriding-default-docker-targets>Overriding default
Docker targets</h3><p>The default <a
href=https://docs.docker.com/engine/reference/commandline/tag/>tag</a> is
sdk_version defined at <a
href=https://github.com/apache/beam/blob/master/gradle.properties>gradle.properties</a>
and the default repositories are in the Docker Hub <code>apache</code>
namespace.
-The <code>docker</code> command-line tool implicitly <a
href=#pushing-container-images>pushes container images</a> to this
location.</p><p>To tag a local image, set the <code>docker-tag</code> option
when building the container. The following command tags a Python SDK image with
a date.</p><pre><code>./gradlew :sdks:python:container:py36:docker
-Pdocker-tag=2019-10-04
-</code></pre><p>To change the repository, set the
<code>docker-repository-root</code> option to a new location. The following
command sets the <code>docker-repository-root</code>
-to a repository named <code>example-repo</code> on Docker
Hub.</p><pre><code>./gradlew :sdks:python:container:py36:docker
-Pdocker-repository-root=example-repo
-</code></pre><h2 id=pushing-container-images>Pushing container
images</h2><p>After <a href=#building-container-images>building a container
image</a>, you can store it in a remote Docker repository.</p><p>The following
steps push a Python3.6 SDK image to the <a
href=#overriding-default-docker-targets><code>docker-root-repository</code>
value</a>.
-Please log in to the destination repository as needed.</p><p>Upload it to the
remote repository:</p><pre><code>docker push example-repo/beam_python3.6_sdk
-</code></pre><p>To download the image again, run <code>docker
pull</code>:</p><pre><code>docker pull example-repo/beam_python3.6_sdk
-</code></pre><blockquote><p><strong>Note</strong>: After pushing a container
image, the remote image ID and digest match the local image ID and
digest.</p></blockquote></div></div><footer class=footer><div
class=footer__contained><div class=footer__cols><div
class=footer__cols__col><div class=footer__cols__col__logo><img
src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div
class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg
class=footer__logo a [...]
+--runner=SparkRunner \
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"</code></pre></div><div
class=runner-dataflow><pre><code>export GCS_PATH="gs://my-gcs-bucket"
+export GCP_PROJECT="my-gcp-project"
+export REGION="us-central1"
+
+# By default, the Dataflow runner has access to the GCR images
+# under the same project.
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
+
+# Run a pipeline on Dataflow.
+# This is a Python batch pipeline, so to run on Dataflow Runner V2
+# you must specify the experiment "use_runner_v2"
+
+python -m apache_beam.examples.wordcount \
+ --input gs://dataflow-samples/shakespeare/kinglear.txt \
+ --output "${GCS_PATH}/counts" \
+ --runner DataflowRunner \
+ --project $GCP_PROJECT \
+ --region $REGION \
+ --temp_location "${GCS_PATH}/tmp/" \
+ --experiment=use_runner_v2 \
+ --worker_harness_container_image=$IMAGE_URL</code></pre></div><h3
id=troubleshooting>Troubleshooting</h3><p>The following section describes some
common issues to consider
+when you encounter unexpected errors running Beam pipelines with
+custom containers.</p><ul><li>Differences in language and SDK version between
the container SDK and
+pipeline SDK may result in unexpected errors due to incompatibility. For best
+results, make sure to use the same stable SDK version for your base container
+and when running your pipeline.</li><li>If you are running into unexpected
errors when using remote containers,
+make sure that your container exists in the remote repository and can be
+accessed by any third-party service, if needed.</li><li>Local runners attempt
to pull remote images and default to local
+images. If an image cannot be pulled locally (by the docker daemon),
+you may see an log message like:<pre><code>Error response from daemon:
manifest for remote.repo/beam_python3.7_sdk:2.25.0-custom not found: manifest
unknown: ...
+INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Unable to
pull image...
+</code></pre></li></ul></div></div><footer class=footer><div
class=footer__contained><div class=footer__cols><div
class=footer__cols__col><div class=footer__cols__col__logo><img
src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div
class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg
class=footer__logo alt="Apache logo"></div></div><div class="footer__cols__col
footer__cols__col--md"><div class=footer__cols__col__title>Start</div><div
class=foote [...]
<a href=http://www.apache.org>The Apache Software Foundation</a>
| <a href=/privacy_policy>Privacy Policy</a>
| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam
logo, and the Apache feather logo are either registered trademarks or
trademarks of The Apache Software Foundation. All other products or name brands
are trademarks of their respective holders, including The Apache Software
Foundation.</div></footer></body></html>
\ No newline at end of file
diff --git a/website/generated-content/sitemap.xml
b/website/generated-content/sitemap.xml
index 2d71281..a6a4dab 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2020-12-14T07:36:58-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-12-14T07:36:58-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-12-14T07:36:58-08:00</lastmod></url><url><loc>/blog/splittable-do-fn-is-available/</loc><lastmod>2020-12-01T17:42:26-08:00</lastmod></url
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2020-12-14T07:36:58-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-12-14T07:36:58-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-12-14T07:36:58-08:00</lastmod></url><url><loc>/blog/splittable-do-fn-is-available/</loc><lastmod>2020-12-01T17:42:26-08:00</lastmod></url
[...]
\ No newline at end of file