pcoet commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r830322738
##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,48 @@ creates a Java 8 SDK image with appropriate licenses in
`/opt/apache/beam/third_
By default, no licenses/notices are added to the docker images.
+#### Modifying an existing container image to make it compatible with Apache
Beam Runners {#modify-existing-base-image}
+Beam offers a way to provide your own custom container image. The easiest way
to build a new custom image that is compatible with Apache Beam Runners is to
use a [multi-stage
build](https://docs.docker.com/develop/develop-images/multistage-build/)
process. This copies over the necessary artifacts from a default Apache Beam
base image to build your custom container image.
+
+1. Copy necessary artifacts from Apache Beam base image to your image.
+ ```
+ # This can be any container image,
+ FROM python:3.7-bullseye
+
+ # Install SDK. (needed for Python SDK)
+ RUN pip install --no-cache-dir apache-beam[gcp]==2.35.0
+
+ # Copy files from official SDK image, including script/dependencies.
+ COPY --from=apache/beam_python3.7_sdk:2.35.0 /opt/apache/beam /opt/apache/beam
+
+ # Perform any additional customizations if desired
+
+ # Set the entrypoint to Apache Beam SDK launcher.
+ ENTRYPOINT ["/opt/apache/beam/boot"]
+
+ ```
+>**NOTE**: This example assumes necessary dependencies (in this case, Python
3.7 and pip) have been installed on the existing base image. Installing the
Apache Beam SDK into the image will ensure that the image has the necessary SDK
dependencies and reduce the worker startup time.
+>The version specified in the `RUN` instruction must match the version used to
launch the pipeline.<br>
+>**Users need to make sure that whatever base image they use has the same
Python/Java interpreter version that they used to run the pipeline**.
Review comment:
Consider: "Make sure that the Python or Java runtime version specified
in the base image is the same as the version used to run the pipeline."
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,17 @@ If your pipeline uses public packages from the [Python
Package Index](https://py
The runner will use the `requirements.txt` file to install your additional
dependencies onto the remote workers.
**Important:** Remote workers will install all packages listed in the
`requirements.txt` file. Because of this, it's very important that you delete
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you
don't remove non-PyPI packages, the remote workers will fail when attempting to
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip freeze` is to use a library like
[pip-tools](https://github.com/jazzband/pip-tools) to compile the all the
dependencies required for the pipeline from a `--requirements_file`, where only
top-level dependencies are mentioned.
Review comment:
"compile the all" -> "compile all"
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +134,25 @@ If your pipeline uses non-Python packages (e.g. packages
that require installati
--setup_file /path/to/setup.py
**Note:** Because custom commands execute after the dependencies for your
workflow are installed (by `pip`), you should omit the PyPI package dependency
from the pipeline's `requirements.txt` file and from the `install_requires`
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In pipeline execution modes where a Beam runner launches SDK workers in Docker
containers, the additional pipeline dependencies (specified via
`--requirements_file` and other runtime options) are installed into the
containers at runtime. This can increase the worker startup time.
+ However, it may be possible to pre-build the SDK containers and perform the
dependency installation once before the workers start. To pre-build the
container image before pipeline submission, provide the pipeline options
mentioned below.
+1. Provide the container engine. We support `local_docker`(requires local
installation of Docker) and `cloud_build`(requires a GCP project with Cloud
Build API enabled).
Review comment:
In general, prefer "Beam" to "we", as in "Beam supports..."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]