rosetn commented on a change in pull request #13420: URL: https://github.com/apache/beam/pull/13420#discussion_r545409463
########## File path: website/www/site/content/en/documentation/runtime/environments.md ########## @@ -17,147 +17,259 @@ limitations under the License. # Container environments -The Beam SDK runtime environment is isolated from other runtime systems because the SDK runtime environment is [containerized](https://s.apache.org/beam-fn-api-container-contract) with [Docker](https://www.docker.com/). This means that any execution engine can run the Beam SDK. +The Beam SDK runtime environment can be [containerized](https://www.docker.com/resources/what-container) with [Docker](https://www.docker.com/) to isolate it from other runtime systems. To learn more about the container environment, read the Beam [SDK Harness container contract](https://s.apache.org/beam-fn-api-container-contract). -This page describes how to customize, build, and push Beam SDK container images. +Prebuilt SDK container images are released per supported language during Beam releases and pushed to [Docker Hub](https://hub.docker.com/search?q=apache%2Fbeam&type=image). -Before you begin, install [Docker](https://www.docker.com/) on your workstation. +## Custom containers -## Customizing container images +You may want to customize container images for many reasons, including: -You can add extra dependencies to container images so that you don't have to supply the dependencies to execution engines. +* Pre-installing additional dependencies +* Launching third-party software in the worker environment +* Further customizing the execution environment -To customize a container image, either: -* [Write a new](#writing-new-dockerfiles) [Dockerfile](https://docs.docker.com/engine/reference/builder/) on top of the original. -* [Modify](#modifying-dockerfiles) the [original Dockerfile](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile) and reimage the container. + This guide describes how to create and use customized containers for the Beam SDK. -It's often easier to write a new Dockerfile. However, by modifying the original Dockerfile, you can customize anything (including the base OS). +### Prerequisites -### Writing new Dockerfiles on top of the original {#writing-new-dockerfiles} +* You will need to use Docker, either by [installing Docker tools locally](https://docs.docker.com/get-docker/) or using build services that can run Docker, such as [Google Cloud Build](https://cloud.google.com/cloud-build/docs/building/build-containers). +* You will need to have a container registry accessible by your execution engine or runner to host a custom container image. Options include [Docker Hub](https://hub.docker.com/) or a "self-hosted" repository, including cloud-specific container registries like [Google Container Registry](https://cloud.google.com/container-registry) (GCR) or [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) (ECR). + +> **NOTE**: On Nov 20, 2020, Docker Hub put [rate limits](https://www.docker.com/increase-rate-limits) into effect for anonymous and free authenticated use, which may impact larger pipelines that pull containers several times. + +For optimal user experience, we also recommend you use the latest released version of Beam. + +### Building and pushing custom containers + +Beam [SDK container images](https://hub.docker.com/search?q=apache%2Fbeam&type=image) are built from Dockerfiles checked into the [Github](https://github.com/apache/beam) repository and published to Docker Hub for every release. You can build customized containers in one of two ways: + +1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on a released container image**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables. +2. **[Modifying](#modifying-dockerfiles) a source Dockerfile in [Beam](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions). + +#### Writing a new Dockerfile based on an existing published container image {#writing-new-dockerfiles} + +1. Create a new Dockerfile that designates a base image using the [FROM instruction](https://docs.docker.com/engine/reference/builder/#from). + +``` +FROM apache/beam_python3.7_sdk:2.25.0 + +ENV FOO=bar +COPY /src/path/to/file /dest/path/to/file/ +``` + +This `Dockerfile`: uses the prebuilt Python 3.7 SDK container image [`beam_python3.7_sdk`](https://hub.docker.com/r/apache/beam_python3.7_sdk) tagged at (SDK version) `2.25.0`, and adds an additional environment variable and file to the image. Review comment: Extra colon after `Dockerfile` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
