AnandInguva commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r820906597
##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in
`/opt/apache/beam/third_
By default, no licenses/notices are added to the docker images.
+#### Build an existing container image to make it compatible with Apache Beam
Runners {#modify-existing-base-image}
+Beam offers a way to take a Beam container image and customize it. But if you
have an existing base image to be compatible with Apache Beam Runners, use a
[multi-stage
build](https://docs.docker.com/develop/develop-images/multistage-build/)
process to copy over the necessary artifacts from a default Apache Beam base
image and provide your custom container image.
+
+
+1. Copy necessary artifacts from Apache Beam base image to your image.
+ ```
+ # This can be any container image,
+ FROM python:3.8-slim
Review comment:
Thanks for catching
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python
Package Index](https://py
The runner will use the `requirements.txt` file to install your additional
dependencies onto the remote workers.
**Important:** Remote workers will install all packages listed in the
`requirements.txt` file. Because of this, it's very important that you delete
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you
don't remove non-PyPI packages, the remote workers will fail when attempting to
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like
[pip-tools](https://github.com/jazzband/pip-tools) to compile the
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image
with all the dependencies that are needed for the pipeline instead of
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom
Container
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at
runtime and specify `--requirements_file` option, we recommend you to install
the dependencies from the `--requirements_file` when building your container
image. In this case, you would reduce the pipeline startup time and do not need
to pass `--requirements_file` option at runtime.
Review comment:
Changed it
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python
Package Index](https://py
The runner will use the `requirements.txt` file to install your additional
dependencies onto the remote workers.
**Important:** Remote workers will install all packages listed in the
`requirements.txt` file. Because of this, it's very important that you delete
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you
don't remove non-PyPI packages, the remote workers will fail when attempting to
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like
[pip-tools](https://github.com/jazzband/pip-tools) to compile the
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image
with all the dependencies that are needed for the pipeline instead of
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom
Container
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at
runtime and specify `--requirements_file` option, we recommend you to install
the dependencies from the `--requirements_file` when building your container
image. In this case, you would reduce the pipeline startup time and do not need
to pass `--requirements_file` option at runtime.
+
+ # Add these lines with the path to the requirements.txt to the
Dockerfile
+
+ COPY <path to requirements.txt> /tmp/requirements.txt
+ RUN python -m pip download -r /tmp/requirements.txt
+
+**Note:** [Different
approaches](https://beam.apache.org/documentation/runtime/environments/#writing-new-dockerfiles)
to build the container images that would be compatible with Apache Beam
Runners.
Review comment:
I thought may be referencing on how to use custom container would be
useful but thinking about it, you are right
##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -46,7 +46,7 @@ Beam [SDK container
images](https://hub.docker.com/search?q=apache%2Fbeam&type=i
1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on a released
container image**. This is sufficient for simple additions to the image, such
as adding artifacts or environment variables.
2. **[Modifying](#modifying-dockerfiles) a source Dockerfile in
[Beam](https://github.com/apache/beam)**. This method requires building from
Beam source but allows for greater customization of the container (including
replacement of artifacts or base OS/language versions).
-
+3. **[Build](#modify-existing-base-image) an existing container image to make
it compatible with Apache Beam Runners**. This method is used when users start
from an existing image, and configure the image to be compatible with Apache
Beam Runners.
Review comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]