pcoet commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r830322738



##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,48 @@ creates a Java 8 SDK image with appropriate licenses in 
`/opt/apache/beam/third_
 
 By default, no licenses/notices are added to the docker images.
 
+#### Modifying an existing container image to make it compatible with Apache 
Beam Runners {#modify-existing-base-image}
+Beam offers a way to provide your own custom container image. The easiest way 
to build a new custom image that is compatible with Apache Beam Runners is to 
use a [multi-stage 
build](https://docs.docker.com/develop/develop-images/multistage-build/) 
process. This copies over the necessary artifacts from a default Apache Beam 
base image to build your custom container image.
+
+1. Copy necessary artifacts from Apache Beam base image to your image.
+  ```
+  # This can be any container image,
+ FROM python:3.7-bullseye
+
+ # Install SDK. (needed for Python SDK)
+ RUN pip install --no-cache-dir apache-beam[gcp]==2.35.0
+
+ # Copy files from official SDK image, including script/dependencies.
+ COPY --from=apache/beam_python3.7_sdk:2.35.0 /opt/apache/beam /opt/apache/beam
+
+ # Perform any additional customizations if desired
+
+ # Set the entrypoint to Apache Beam SDK launcher.
+ ENTRYPOINT ["/opt/apache/beam/boot"]
+
+  ```
+>**NOTE**: This example assumes necessary dependencies (in this case, Python 
3.7 and pip) have been installed on the existing base image. Installing the 
Apache Beam SDK into the image will ensure that the image has the necessary SDK 
dependencies and reduce the worker startup time.
+>The version specified in the `RUN` instruction must match the version used to 
launch the pipeline.<br>
+>**Users need to make sure that whatever base image they use has the same 
Python/Java interpreter version that they used to run the pipeline**.

Review comment:
       Consider: "Make sure that the Python or Java runtime version specified 
in the base image is the same as the version used to run the pipeline."

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,17 @@ If your pipeline uses public packages from the [Python 
Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional 
dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the 
`requirements.txt` file. Because of this, it's very important that you delete 
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you 
don't remove non-PyPI packages, the remote workers will fail when attempting to 
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip freeze` is to use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile the all the 
dependencies required for the pipeline from a `--requirements_file`, where only 
top-level dependencies are mentioned.

Review comment:
       "compile the all" -> "compile all"

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +134,25 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In pipeline execution modes where a Beam runner launches SDK workers in Docker 
containers, the additional pipeline dependencies (specified via 
`--requirements_file` and other runtime options) are installed into the 
containers at runtime. This can increase the worker startup time.
+ However, it may be possible to pre-build the SDK containers and perform the 
dependency installation once before the workers start. To pre-build the 
container image before pipeline submission, provide the pipeline options 
mentioned below.
+1. Provide the container engine. We support `local_docker`(requires local 
installation of Docker) and `cloud_build`(requires a GCP project with Cloud 
Build API enabled).

Review comment:
       In general, prefer "Beam" to "we", as in "Beam supports..."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to