rosetn commented on a change in pull request #13420:
URL: https://github.com/apache/beam/pull/13420#discussion_r539670389



##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -17,147 +17,255 @@ limitations under the License.
 
 # Container environments
 
-The Beam SDK runtime environment is isolated from other runtime systems 
because the SDK runtime environment is 
[containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/). This means that any execution engine can run 
the Beam SDK.
+The Beam SDK runtime environment can be 
[containerized](https://www.docker.com/resources/what-container) with 
[Docker](https://www.docker.com/) to isolate it from other runtime systems. To 
learn more about the container environment, read the Beam [SDK Harness 
container contract](https://s.apache.org/beam-fn-api-container-contract).
 
-This page describes how to customize, build, and push Beam SDK container 
images.
+Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
 
-Before you begin, install [Docker](https://www.docker.com/) on your 
workstation.
+## Custom containers
 
-## Customizing container images
+You may want to customize container images for many reasons, including:
 
-You can add extra dependencies to container images so that you don't have to 
supply the dependencies to execution engines.
+* Pre-installing additional dependencies
+* Launching third-party software in the worker environment
+* Further customizing the execution environment
 
-To customize a container image, either:
-* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original.
-* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container.
+ This guide describes how to create and use customized containers for the Beam 
SDK.
 
-It's often easier to write a new Dockerfile. However, by modifying the 
original Dockerfile, you can customize anything (including the base OS).
+### Prerequisites
 
-### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+* You will need to use Docker, either by [installing Docker tools 
locally](https://docs.docker.com/get-docker/) or using build services that can 
run Docker, such as [Google Cloud 
Build](https://cloud.google.com/cloud-build/docs/building/build-containers).
+* You will need to have a container registry accessible by your execution 
engine or runner to host a custom container image. Options include [Docker 
Hub](https://hub.docker.com/) or a "self-hosted" repository, including 
cloud-specific container registries like [Google Container 
Registry](https://cloud.google.com/container-registry) (GCR) or [Amazon Elastic 
Container Registry](https://aws.amazon.com/ecr/) (ECR).
+
+>  **NOTE**: On Nov 20, 2020, Docker Hub put [rate 
limits](https://www.docker.com/increase-rate-limits) into effect for anonymous 
and free authenticated use, which may impact larger pipelines that pull 
containers several times.
+
+For optimal user experience, we also recommend you use the latest released 
version of Beam.
+
+### Building and pushing custom containers
+
+Beam [SDK container 
images](https://hub.docker.com/search?q=apache%2Fbeam&type=image) are built 
from Dockerfiles checked into the [Github](https://github.com/apache/beam) 
repository and published to Docker Hub for every release. You can build 
customized containers in one of two ways:
+
+1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on a released 
container image**. This is sufficient for simple additions to the image, such 
as adding artifacts or environment variables.
+2. **[Modifying](#modifying-dockerfiles) a source Dockerfile in 
[Beam](https://github.com/apache/beam)**. This method requires building from 
Beam source but allows for greater customization of the container (including 
replacement of artifacts or base OS/language versions).
+
+#### Writing a new Dockerfile based on an existing published container image 
{#writing-new-dockerfiles}
+
+1. Create a new Dockerfile that designates a base image using the [FROM 
instruction](https://docs.docker.com/engine/reference/builder/#from).
+
+```
+FROM apache/beam_python3.7_sdk:2.25.0
+
+ENV FOO=bar
+COPY /src/path/to/file /dest/path/to/file/
+```
+
+This `Dockerfile`: uses the prebuilt Python 3.7 SDK container image 
[`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) 
tagged at (SDK version) `2.25.0`, and adds an additional environment variable 
and file to the image.
+
+
+2. [Build](https://docs.docker.com/engine/reference/commandline/build/) and 
[push](https://docs.docker.com/engine/reference/commandline/push/) the image 
using Docker.
+
+  ```
+  export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0"
+  export IMAGE_NAME="myremoterepo/mybeamsdk"
+  export TAG="latest"
+
+  # Optional - pull the base image into your local Docker daemon to ensure
+  # you have the most up-to-date version of the base image locally.
+  docker pull "${BASE_IMAGE}"
+
+  docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" .
+  ```
+
+3. If your runner is running remotely, you will need to retag the image and 
[push](https://docs.docker.com/engine/reference/commandline/push/) the image 
using Docker to a remote repository accessible by your runner.
+
+  ```
+  docker push "${IMAGE_NAME}:${TAG}"
+  ```
+
+4. After pushing a container image, you should verify the remote image ID and 
digest should match the local image ID and digest, output from `docker build` 
or `docker images`.
+
+#### Modifying a source Dockerfile in Beam {#modifying-dockerfiles}
+
+This method will require building image artifacts from Beam source. For 
additional instructions on setting up your development environment, see the 
[Contribution guide](/contribute/#development-setup).
+
+>**NOTE**: It is recommended that you start from a stable release branch 
(`release-X.XX.X`) corresponding to the same version of the SDK to run your 
pipeline. Differences in SDK version may result in unexpected errors.
+
+1. Clone the `beam` repository.
+
+  ```
+  export BEAM_SDK_VERSION="2.26.0"
+  git clone https://github.com/apache/beam.git
+  cd beam
+
+  # Save current directory as working directory
+  export BEAM_WORKDIR=$PWD
+
+  git checkout origin/release-$BEAM_SDK_VERSION
+  ```
+
+2. Customize the `Dockerfile` for a given language, typically 
`sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for 
Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
 If you're adding dependencies from [PyPI](https://pypi.org/), use 
[`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt)
 instead.
+
+3. Return to the root Beam directory and run the Gradle `docker` target for 
your image.
+
+  ```
+  cd $BEAM_WORKDIR
+
+  # The default repository of each SDK
+  ./gradlew :sdks:java:container:java8:docker
+  ./gradlew :sdks:java:container:java11:docker
+  ./gradlew :sdks:go:container:docker
+  ./gradlew :sdks:python:container:py36:docker
+  ./gradlew :sdks:python:container:py37:docker
+  ./gradlew :sdks:python:container:py38:docker
+
+  # Shortcut for building all Python SDKs
+  ./gradlew :sdks:python:container buildAll
+  ```
+
+4. Verify the images you built were created by running `docker images`.
+
+  ```
+  $> docker images --digests
+  REPOSITORY                         TAG                  DIGEST               
    IMAGE ID         CREATED           SIZE
+  apache/beam_java8_sdk              latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_java11_sdk             latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_python3.6_sdk          latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_python3.7_sdk          latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_python3.8_sdk          latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_go_sdk                 latest               sha256:...           
    ...              1 min ago         ...
+  ```
+
+5. If your runner is running remotely, you will need to retag the image and 
[push](https://docs.docker.com/engine/reference/commandline/push/) the image 
using Docker to a remote repository accessible by your runner.
+   You can also provide a custom repo/tag as [additional 
parameters](#additional-build-parameters).
+
+  ```
+  export BEAM_SDK_VERSION="2.26.0"
+  export IMAGE_NAME="gcr.io/my-gcp-project/beam_python3.7_sdk"
+  export TAG="${BEAM_SDK_VERSION}-custom"
+
+  docker tag apache/beam_python3.7_sdk "${IMAGE_NAME}:${TAG}"
+  docker push "${IMAGE_NAME}:${TAG}"
+  ```
+
+6. After pushing a container image, verify the remote image ID and digest 
matches the local image ID and digest output from `docker_images --digests`.
+
+#### Additional build parameters{#additional-build-parameters}
+
+The docker Gradle task defines a default image repository and 
[tag](https://docs.docker.com/engine/reference/commandline/tag/) is the SDK 
version defined at 
[gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties).
 The default repository is the Docker Hub `apache` namespace, and the default 
tag is the [SDK 
version](https://github.com/apache/beam/blob/master/gradle.properties) defined 
at gradle.properties.
+
+You can specify a different repository or tag for built images by providing 
parameters to the build task. For example:
 
-1. Pull a [prebuilt SDK container 
image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your 
[target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) 
language and version. The following example pulls the latest Python SDK:
 ```
-docker pull apache/beam_python3.7_sdk
+./gradlew :sdks:python:container:py36:docker 
-Pdocker-repository-root="example-repo" -Pdocker-tag="2.26.0-custom"
 ```
-2. [Write a new 
Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
 that [designates](https://docs.docker.com/engine/reference/builder/#from) the 
original as its [parent](https://docs.docker.com/glossary/?term=parent%20image).
-3. [Build](#building-container-images) a child image.
 
-### Modifying the original Dockerfile {#modifying-dockerfiles}
+builds the Python 3.6 container and tags it as 
`example-repo/beam_python3.6_sdk:2.26.0-custom`.
+
+From Beam 2.21.0 and later, a `docker-pull-licenses` flag was introduced to 
add licenses/notices for third party dependencies to the docker images. For 
example:
 
-1. Clone the `beam` repository:
 ```
-git clone https://github.com/apache/beam.git
+./gradlew :sdks:java:container:java8:docker -Pdocker-pull-licenses
 ```
-2. Customize the 
[Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
 If you're adding dependencies from [PyPI](https://pypi.org/), use 
[`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt)
 instead.
-3. [Reimage](#building-container-images) the container.
+creates a Java 8 SDK image with appropriate licenses in 
`/opt/apache/beam/third_party_licenses/`.
 
-### Testing customized images
+By default, no licenses/notices are added to the docker images.
+
+
+## Running pipelines with custom container images {#running-pipelines}
 
-To test a customized image locally, run a pipeline with PortableRunner and set 
the `--environment_config` flag to the image path:
+The common method for providing a container image requires using the
+PortableRunner flag `--environment_config` as supported by the Portable
+Runner or by runners supported PortableRunner flags.
+Other runners, such as Dataflow, support specifying containers with different 
flags.
+
+<!--
+  TODO(emilymye): Should be updated to PortableRunner flag 
--environment_options
+ (added in 2.25.0) once this flags has been validated and ported over to all
+ runners
+-->
 
 {{< highlight class="runner-direct" >}}
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
+
 python -m apache_beam.examples.wordcount \
 --input=/path/to/inputfile \
 --output /path/to/write/counts \
 --runner=PortableRunner \
 --job_endpoint=embed \
---environment_config=path/to/container/image
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-local" >}}
-# Start a Flink job server on localhost:8099
-./gradlew :runners:flink:1.8:job-server:runShadow
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
 
-# Run a pipeline on the Flink job server
+# Run a pipeline using the FlinkRunner which starts a Flink job server.
 python -m apache_beam.examples.wordcount \
 --input=/path/to/inputfile \
---output=/path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image
+--output=path/to/write/counts \
+--runner=FlinkRunner \
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"
 {{< /highlight >}}
 
 {{< highlight class="runner-spark-local" >}}
-# Start a Spark job server on localhost:8099
-./gradlew :runners:spark:job-server:runShadow
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
 
-# Run a pipeline on the Spark job server
+# Run a pipeline using the SparkRunner which starts the Spark job server
 python -m apache_beam.examples.wordcount \
 --input=/path/to/inputfile \
 --output=path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image
+--runner=SparkRunner \
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"
 {{< /highlight >}}
 
-## Building container images
+{{< highlight class="runner-dataflow" >}}
+export GCS_PATH="gs://my-gcs-bucket"
+export GCP_PROJECT="my-gcp-project"
+export REGION="us-central1"
 
-To build Beam SDK container images:
+# By default, the Dataflow runner will have access to the GCR images
+# under the same project.
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
 
-1. Navigate to the root directory of the local copy of your Apache Beam.
-2. Run Gradle with the `docker` target. If you're [building a child 
image](#writing-new-dockerfiles), set the optional `--file` flag to the new 
Dockerfile. If you're [building an image from an original 
Dockerfile](#modifying-dockerfiles), ignore the `--file` flag:
+# Run a pipeline on Dataflow.
+# This is a Python batch pipeline, so to run on Dataflow Runner V2
+# you must specify the experiment "use_runner_v2"
 
-```
-# The default repository of each SDK
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java8:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java11:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:go:container:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py2:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py35:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py36:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py37:docker
-
-# Shortcut for building all four Python SDKs
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container buildAll
-```
-
-From 2.21.0, `docker-pull-licenses` tag was introduced. Licenses/notices of 
third party dependencies will be added to the docker images when 
`docker-pull-licenses` was set.
-For example, `./gradlew :sdks:java:container:java8:docker 
-Pdocker-pull-licenses`. The files are added to 
`/opt/apache/beam/third_party_licenses/`.
-By default, no licenses/notices are added to the docker images.
-
-To examine the containers that you built, run `docker images` from anywhere in 
the command line. If you successfully built all of the container images, the 
command prints a table like the following:
-```
-REPOSITORY                         TAG                 IMAGE ID            
CREATED           SIZE
-apache/beam_java8_sdk              latest              ...                 2 
weeks ago       ...
-apache/beam_java11_sdk             latest              ...                 2 
weeks ago       ...
-apache/beam_python2.7_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_python3.5_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_python3.6_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_python3.7_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_go_sdk                 latest              ...                 2 
weeks ago       ...
-```
-
-### Overriding default Docker targets
-
-The default [tag](https://docs.docker.com/engine/reference/commandline/tag/) 
is sdk_version defined at 
[gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties)
 and the default repositories are in the Docker Hub `apache` namespace.
-The `docker` command-line tool implicitly [pushes container 
images](#pushing-container-images) to this location.
-
-To tag a local image, set the `docker-tag` option when building the container. 
The following command tags a Python SDK image with a date.
-```
-./gradlew :sdks:python:container:py36:docker -Pdocker-tag=2019-10-04
-```
-
-To change the repository, set the `docker-repository-root` option to a new 
location. The following command sets the `docker-repository-root`
-to a repository named `example-repo` on Docker Hub.
-```
-./gradlew :sdks:python:container:py36:docker 
-Pdocker-repository-root=example-repo
-```
+python -m apache_beam.examples.wordcount \
+  --input gs://dataflow-samples/shakespeare/kinglear.txt \
+  --output "${GCS_PATH}/counts" \
+  --runner DataflowRunner \
+  --project $GCP_PROJECT \
+  --region $REGION \
+  --temp_location "${GCS_PATH}/tmp/" \
+  --experiment=use_runner_v2 \
+  --worker_harness_container_image=$IMAGE_URL
 
-## Pushing container images
+{{< /highlight >}}
 
-After [building a container image](#building-container-images), you can store 
it in a remote Docker repository.
 
-The following steps push a Python3.6 SDK image to the 
[`docker-root-repository` value](#overriding-default-docker-targets).
-Please log in to the destination repository as needed.
+### Troubleshooting/TIps

Review comment:
       Typo here--I also think you can just call this section "Troubleshooting" 
or "Considerations"

##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -17,147 +17,255 @@ limitations under the License.
 
 # Container environments
 
-The Beam SDK runtime environment is isolated from other runtime systems 
because the SDK runtime environment is 
[containerized](https://s.apache.org/beam-fn-api-container-contract) with 
[Docker](https://www.docker.com/). This means that any execution engine can run 
the Beam SDK.
+The Beam SDK runtime environment can be 
[containerized](https://www.docker.com/resources/what-container) with 
[Docker](https://www.docker.com/) to isolate it from other runtime systems. To 
learn more about the container environment, read the Beam [SDK Harness 
container contract](https://s.apache.org/beam-fn-api-container-contract).
 
-This page describes how to customize, build, and push Beam SDK container 
images.
+Prebuilt SDK container images are released per supported language during Beam 
releases and pushed to [Docker 
Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image).
 
-Before you begin, install [Docker](https://www.docker.com/) on your 
workstation.
+## Custom containers
 
-## Customizing container images
+You may want to customize container images for many reasons, including:
 
-You can add extra dependencies to container images so that you don't have to 
supply the dependencies to execution engines.
+* Pre-installing additional dependencies
+* Launching third-party software in the worker environment
+* Further customizing the execution environment
 
-To customize a container image, either:
-* [Write a new](#writing-new-dockerfiles) 
[Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the 
original.
-* [Modify](#modifying-dockerfiles) the [original 
Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile)
 and reimage the container.
+ This guide describes how to create and use customized containers for the Beam 
SDK.
 
-It's often easier to write a new Dockerfile. However, by modifying the 
original Dockerfile, you can customize anything (including the base OS).
+### Prerequisites
 
-### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles}
+* You will need to use Docker, either by [installing Docker tools 
locally](https://docs.docker.com/get-docker/) or using build services that can 
run Docker, such as [Google Cloud 
Build](https://cloud.google.com/cloud-build/docs/building/build-containers).
+* You will need to have a container registry accessible by your execution 
engine or runner to host a custom container image. Options include [Docker 
Hub](https://hub.docker.com/) or a "self-hosted" repository, including 
cloud-specific container registries like [Google Container 
Registry](https://cloud.google.com/container-registry) (GCR) or [Amazon Elastic 
Container Registry](https://aws.amazon.com/ecr/) (ECR).
+
+>  **NOTE**: On Nov 20, 2020, Docker Hub put [rate 
limits](https://www.docker.com/increase-rate-limits) into effect for anonymous 
and free authenticated use, which may impact larger pipelines that pull 
containers several times.
+
+For optimal user experience, we also recommend you use the latest released 
version of Beam.
+
+### Building and pushing custom containers
+
+Beam [SDK container 
images](https://hub.docker.com/search?q=apache%2Fbeam&type=image) are built 
from Dockerfiles checked into the [Github](https://github.com/apache/beam) 
repository and published to Docker Hub for every release. You can build 
customized containers in one of two ways:
+
+1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on a released 
container image**. This is sufficient for simple additions to the image, such 
as adding artifacts or environment variables.
+2. **[Modifying](#modifying-dockerfiles) a source Dockerfile in 
[Beam](https://github.com/apache/beam)**. This method requires building from 
Beam source but allows for greater customization of the container (including 
replacement of artifacts or base OS/language versions).
+
+#### Writing a new Dockerfile based on an existing published container image 
{#writing-new-dockerfiles}
+
+1. Create a new Dockerfile that designates a base image using the [FROM 
instruction](https://docs.docker.com/engine/reference/builder/#from).
+
+```
+FROM apache/beam_python3.7_sdk:2.25.0
+
+ENV FOO=bar
+COPY /src/path/to/file /dest/path/to/file/
+```
+
+This `Dockerfile`: uses the prebuilt Python 3.7 SDK container image 
[`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) 
tagged at (SDK version) `2.25.0`, and adds an additional environment variable 
and file to the image.
+
+
+2. [Build](https://docs.docker.com/engine/reference/commandline/build/) and 
[push](https://docs.docker.com/engine/reference/commandline/push/) the image 
using Docker.
+
+  ```
+  export BASE_IMAGE="apache/beam_python3.7_sdk:2.25.0"
+  export IMAGE_NAME="myremoterepo/mybeamsdk"
+  export TAG="latest"
+
+  # Optional - pull the base image into your local Docker daemon to ensure
+  # you have the most up-to-date version of the base image locally.
+  docker pull "${BASE_IMAGE}"
+
+  docker build -f Dockerfile -t "${IMAGE_NAME}:${TAG}" .
+  ```
+
+3. If your runner is running remotely, you will need to retag the image and 
[push](https://docs.docker.com/engine/reference/commandline/push/) the image 
using Docker to a remote repository accessible by your runner.
+
+  ```
+  docker push "${IMAGE_NAME}:${TAG}"
+  ```
+
+4. After pushing a container image, you should verify the remote image ID and 
digest should match the local image ID and digest, output from `docker build` 
or `docker images`.
+
+#### Modifying a source Dockerfile in Beam {#modifying-dockerfiles}
+
+This method will require building image artifacts from Beam source. For 
additional instructions on setting up your development environment, see the 
[Contribution guide](/contribute/#development-setup).
+
+>**NOTE**: It is recommended that you start from a stable release branch 
(`release-X.XX.X`) corresponding to the same version of the SDK to run your 
pipeline. Differences in SDK version may result in unexpected errors.
+
+1. Clone the `beam` repository.
+
+  ```
+  export BEAM_SDK_VERSION="2.26.0"
+  git clone https://github.com/apache/beam.git
+  cd beam
+
+  # Save current directory as working directory
+  export BEAM_WORKDIR=$PWD
+
+  git checkout origin/release-$BEAM_SDK_VERSION
+  ```
+
+2. Customize the `Dockerfile` for a given language, typically 
`sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for 
Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
 If you're adding dependencies from [PyPI](https://pypi.org/), use 
[`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt)
 instead.
+
+3. Return to the root Beam directory and run the Gradle `docker` target for 
your image.
+
+  ```
+  cd $BEAM_WORKDIR
+
+  # The default repository of each SDK
+  ./gradlew :sdks:java:container:java8:docker
+  ./gradlew :sdks:java:container:java11:docker
+  ./gradlew :sdks:go:container:docker
+  ./gradlew :sdks:python:container:py36:docker
+  ./gradlew :sdks:python:container:py37:docker
+  ./gradlew :sdks:python:container:py38:docker
+
+  # Shortcut for building all Python SDKs
+  ./gradlew :sdks:python:container buildAll
+  ```
+
+4. Verify the images you built were created by running `docker images`.
+
+  ```
+  $> docker images --digests
+  REPOSITORY                         TAG                  DIGEST               
    IMAGE ID         CREATED           SIZE
+  apache/beam_java8_sdk              latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_java11_sdk             latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_python3.6_sdk          latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_python3.7_sdk          latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_python3.8_sdk          latest               sha256:...           
    ...              1 min ago         ...
+  apache/beam_go_sdk                 latest               sha256:...           
    ...              1 min ago         ...
+  ```
+
+5. If your runner is running remotely, you will need to retag the image and 
[push](https://docs.docker.com/engine/reference/commandline/push/) the image 
using Docker to a remote repository accessible by your runner.
+   You can also provide a custom repo/tag as [additional 
parameters](#additional-build-parameters).
+
+  ```
+  export BEAM_SDK_VERSION="2.26.0"
+  export IMAGE_NAME="gcr.io/my-gcp-project/beam_python3.7_sdk"
+  export TAG="${BEAM_SDK_VERSION}-custom"
+
+  docker tag apache/beam_python3.7_sdk "${IMAGE_NAME}:${TAG}"
+  docker push "${IMAGE_NAME}:${TAG}"
+  ```
+
+6. After pushing a container image, verify the remote image ID and digest 
matches the local image ID and digest output from `docker_images --digests`.
+
+#### Additional build parameters{#additional-build-parameters}
+
+The docker Gradle task defines a default image repository and 
[tag](https://docs.docker.com/engine/reference/commandline/tag/) is the SDK 
version defined at 
[gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties).
 The default repository is the Docker Hub `apache` namespace, and the default 
tag is the [SDK 
version](https://github.com/apache/beam/blob/master/gradle.properties) defined 
at gradle.properties.
+
+You can specify a different repository or tag for built images by providing 
parameters to the build task. For example:
 
-1. Pull a [prebuilt SDK container 
image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) for your 
[target](https://docs.docker.com/docker-hub/repos/#searching-for-repositories) 
language and version. The following example pulls the latest Python SDK:
 ```
-docker pull apache/beam_python3.7_sdk
+./gradlew :sdks:python:container:py36:docker 
-Pdocker-repository-root="example-repo" -Pdocker-tag="2.26.0-custom"
 ```
-2. [Write a new 
Dockerfile](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
 that [designates](https://docs.docker.com/engine/reference/builder/#from) the 
original as its [parent](https://docs.docker.com/glossary/?term=parent%20image).
-3. [Build](#building-container-images) a child image.
 
-### Modifying the original Dockerfile {#modifying-dockerfiles}
+builds the Python 3.6 container and tags it as 
`example-repo/beam_python3.6_sdk:2.26.0-custom`.
+
+From Beam 2.21.0 and later, a `docker-pull-licenses` flag was introduced to 
add licenses/notices for third party dependencies to the docker images. For 
example:
 
-1. Clone the `beam` repository:
 ```
-git clone https://github.com/apache/beam.git
+./gradlew :sdks:java:container:java8:docker -Pdocker-pull-licenses
 ```
-2. Customize the 
[Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
 If you're adding dependencies from [PyPI](https://pypi.org/), use 
[`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt)
 instead.
-3. [Reimage](#building-container-images) the container.
+creates a Java 8 SDK image with appropriate licenses in 
`/opt/apache/beam/third_party_licenses/`.
 
-### Testing customized images
+By default, no licenses/notices are added to the docker images.
+
+
+## Running pipelines with custom container images {#running-pipelines}
 
-To test a customized image locally, run a pipeline with PortableRunner and set 
the `--environment_config` flag to the image path:
+The common method for providing a container image requires using the
+PortableRunner flag `--environment_config` as supported by the Portable
+Runner or by runners supported PortableRunner flags.
+Other runners, such as Dataflow, support specifying containers with different 
flags.
+
+<!--
+  TODO(emilymye): Should be updated to PortableRunner flag 
--environment_options
+ (added in 2.25.0) once this flags has been validated and ported over to all
+ runners
+-->
 
 {{< highlight class="runner-direct" >}}
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
+
 python -m apache_beam.examples.wordcount \
 --input=/path/to/inputfile \
 --output /path/to/write/counts \
 --runner=PortableRunner \
 --job_endpoint=embed \
---environment_config=path/to/container/image
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"
 {{< /highlight >}}
 
 {{< highlight class="runner-flink-local" >}}
-# Start a Flink job server on localhost:8099
-./gradlew :runners:flink:1.8:job-server:runShadow
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
 
-# Run a pipeline on the Flink job server
+# Run a pipeline using the FlinkRunner which starts a Flink job server.
 python -m apache_beam.examples.wordcount \
 --input=/path/to/inputfile \
---output=/path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image
+--output=path/to/write/counts \
+--runner=FlinkRunner \
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"
 {{< /highlight >}}
 
 {{< highlight class="runner-spark-local" >}}
-# Start a Spark job server on localhost:8099
-./gradlew :runners:spark:job-server:runShadow
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
 
-# Run a pipeline on the Spark job server
+# Run a pipeline using the SparkRunner which starts the Spark job server
 python -m apache_beam.examples.wordcount \
 --input=/path/to/inputfile \
 --output=path/to/write/counts \
---runner=PortableRunner \
---job_endpoint=localhost:8099 \
---environment_config=path/to/container/image
+--runner=SparkRunner \
+--environment_type="DOCKER" \
+--environment_config="${IMAGE_URL}"
 {{< /highlight >}}
 
-## Building container images
+{{< highlight class="runner-dataflow" >}}
+export GCS_PATH="gs://my-gcs-bucket"
+export GCP_PROJECT="my-gcp-project"
+export REGION="us-central1"
 
-To build Beam SDK container images:
+# By default, the Dataflow runner will have access to the GCR images
+# under the same project.
+export IMAGE="my-repo/beam_python_sdk_custom"
+export TAG="X.Y.Z"
+export IMAGE_URL = "${IMAGE}:${TAG}"
 
-1. Navigate to the root directory of the local copy of your Apache Beam.
-2. Run Gradle with the `docker` target. If you're [building a child 
image](#writing-new-dockerfiles), set the optional `--file` flag to the new 
Dockerfile. If you're [building an image from an original 
Dockerfile](#modifying-dockerfiles), ignore the `--file` flag:
+# Run a pipeline on Dataflow.
+# This is a Python batch pipeline, so to run on Dataflow Runner V2
+# you must specify the experiment "use_runner_v2"
 
-```
-# The default repository of each SDK
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java8:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:java:container:java11:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:go:container:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py2:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py35:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py36:docker
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container:py37:docker
-
-# Shortcut for building all four Python SDKs
-./gradlew [--file=path/to/new/Dockerfile] :sdks:python:container buildAll
-```
-
-From 2.21.0, `docker-pull-licenses` tag was introduced. Licenses/notices of 
third party dependencies will be added to the docker images when 
`docker-pull-licenses` was set.
-For example, `./gradlew :sdks:java:container:java8:docker 
-Pdocker-pull-licenses`. The files are added to 
`/opt/apache/beam/third_party_licenses/`.
-By default, no licenses/notices are added to the docker images.
-
-To examine the containers that you built, run `docker images` from anywhere in 
the command line. If you successfully built all of the container images, the 
command prints a table like the following:
-```
-REPOSITORY                         TAG                 IMAGE ID            
CREATED           SIZE
-apache/beam_java8_sdk              latest              ...                 2 
weeks ago       ...
-apache/beam_java11_sdk             latest              ...                 2 
weeks ago       ...
-apache/beam_python2.7_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_python3.5_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_python3.6_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_python3.7_sdk          latest              ...                 2 
weeks ago       ...
-apache/beam_go_sdk                 latest              ...                 2 
weeks ago       ...
-```
-
-### Overriding default Docker targets
-
-The default [tag](https://docs.docker.com/engine/reference/commandline/tag/) 
is sdk_version defined at 
[gradle.properties](https://github.com/apache/beam/blob/master/gradle.properties)
 and the default repositories are in the Docker Hub `apache` namespace.
-The `docker` command-line tool implicitly [pushes container 
images](#pushing-container-images) to this location.
-
-To tag a local image, set the `docker-tag` option when building the container. 
The following command tags a Python SDK image with a date.
-```
-./gradlew :sdks:python:container:py36:docker -Pdocker-tag=2019-10-04
-```
-
-To change the repository, set the `docker-repository-root` option to a new 
location. The following command sets the `docker-repository-root`
-to a repository named `example-repo` on Docker Hub.
-```
-./gradlew :sdks:python:container:py36:docker 
-Pdocker-repository-root=example-repo
-```
+python -m apache_beam.examples.wordcount \
+  --input gs://dataflow-samples/shakespeare/kinglear.txt \
+  --output "${GCS_PATH}/counts" \
+  --runner DataflowRunner \
+  --project $GCP_PROJECT \
+  --region $REGION \
+  --temp_location "${GCS_PATH}/tmp/" \
+  --experiment=use_runner_v2 \
+  --worker_harness_container_image=$IMAGE_URL
 
-## Pushing container images
+{{< /highlight >}}
 
-After [building a container image](#building-container-images), you can store 
it in a remote Docker repository.
 
-The following steps push a Python3.6 SDK image to the 
[`docker-root-repository` value](#overriding-default-docker-targets).
-Please log in to the destination repository as needed.
+### Troubleshooting/TIps
 
-Upload it to the remote repository:
-```
-docker push example-repo/beam_python3.6_sdk
-```
-
-To download the image again, run `docker pull`:
-```
-docker pull example-repo/beam_python3.6_sdk
-```
+* Differences in language and SDK version between the container SDK and

Review comment:
       Can you write an introductory sentence for this list?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to