tvalentyn commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r819900366
##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in
`/opt/apache/beam/third_
By default, no licenses/notices are added to the docker images.
+#### Build an existing container image to make it compatible with Apache Beam
Runners {#modify-existing-base-image}
+Beam offers a way to take a Beam container image and customize it. But if you
have an existing base image to be compatible with Apache Beam Runners, use a
[multi-stage
build](https://docs.docker.com/develop/develop-images/multistage-build/)
process to copy over the necessary artifacts from a default Apache Beam base
image and provide your custom container image.
+
+
+1. Copy necessary artifacts from Apache Beam base image to your image.
+ ```
+ # This can be any container image,
+ FROM python:3.8-slim
Review comment:
mismatch between py3.8 and py3.7 below
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python
Package Index](https://py
The runner will use the `requirements.txt` file to install your additional
dependencies onto the remote workers.
**Important:** Remote workers will install all packages listed in the
`requirements.txt` file. Because of this, it's very important that you delete
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you
don't remove non-PyPI packages, the remote workers will fail when attempting to
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like
[pip-tools](https://github.com/jazzband/pip-tools) to compile the
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image
with all the dependencies that are needed for the pipeline instead of
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom
Container
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at
runtime and specify `--requirements_file` option, we recommend you to install
the dependencies from the `--requirements_file` when building your container
image. In this case, you would reduce the pipeline startup time and do not need
to pass `--requirements_file` option at runtime.
+
+ # Add these lines with the path to the requirements.txt to the
Dockerfile
+
+ COPY <path to requirements.txt> /tmp/requirements.txt
+ RUN python -m pip download -r /tmp/requirements.txt
+
+**Note:** [Different
approaches](https://beam.apache.org/documentation/runtime/environments/#writing-new-dockerfiles)
to build the container images that would be compatible with Apache Beam
Runners.
Review comment:
I don't think this is relevant here
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python
Package Index](https://py
The runner will use the `requirements.txt` file to install your additional
dependencies onto the remote workers.
**Important:** Remote workers will install all packages listed in the
`requirements.txt` file. Because of this, it's very important that you delete
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you
don't remove non-PyPI packages, the remote workers will fail when attempting to
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like
[pip-tools](https://github.com/jazzband/pip-tools) to compile the
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image
with all the dependencies that are needed for the pipeline instead of
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom
Container
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at
runtime and specify `--requirements_file` option, we recommend you to install
the dependencies from the `--requirements_file` when building your container
image. In this case, you would reduce the pipeline startup time and do not need
to pass `--requirements_file` option at runtime.
+
+ # Add these lines with the path to the requirements.txt to the
Dockerfile
+
+ COPY <path to requirements.txt> /tmp/requirements.txt
+ RUN python -m pip download -r /tmp/requirements.txt
Review comment:
why pip download and not pip install ?
##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in
`/opt/apache/beam/third_
By default, no licenses/notices are added to the docker images.
+#### Build an existing container image to make it compatible with Apache Beam
Runners {#modify-existing-base-image}
Review comment:
@emilymye could you PTAL at this section?
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +136,19 @@ If your pipeline uses non-Python packages (e.g. packages
that require installati
--setup_file /path/to/setup.py
**Note:** Because custom commands execute after the dependencies for your
workflow are installed (by `pip`), you should omit the PyPI package dependency
from the pipeline's `requirements.txt` file and from the `install_requires`
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
Review comment:
@y1chi could you PTAL at this section?
##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -46,7 +46,7 @@ Beam [SDK container
images](https://hub.docker.com/search?q=apache%2Fbeam&type=i
1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on a released
container image**. This is sufficient for simple additions to the image, such
as adding artifacts or environment variables.
2. **[Modifying](#modifying-dockerfiles) a source Dockerfile in
[Beam](https://github.com/apache/beam)**. This method requires building from
Beam source but allows for greater customization of the container (including
replacement of artifacts or base OS/language versions).
-
+3. **[Build](#modify-existing-base-image) an existing container image to make
it compatible with Apache Beam Runners**. This method is used when users start
from an existing image, and configure the image to be compatible with Apache
Beam Runners.
Review comment:
```suggestion
3. **[Modifying](#modify-existing-base-image) an existing container image to
make it compatible with Apache Beam Runners**. This method is used when users
start from an existing image, and configure the image to be compatible with
Apache Beam Runners.
```
Also: one of three ways above.
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python
Package Index](https://py
The runner will use the `requirements.txt` file to install your additional
dependencies onto the remote workers.
**Important:** Remote workers will install all packages listed in the
`requirements.txt` file. Because of this, it's very important that you delete
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you
don't remove non-PyPI packages, the remote workers will fail when attempting to
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like
[pip-tools](https://github.com/jazzband/pip-tools) to compile the
`requirements.txt` with all the dependencies required for the pipeline.
Review comment:
`pip freeze`, not `pip check`
you can explain:
`...to compile the `requirements.txt` all transitive dependencies from a
smaller set of requirements.```
##########
File path:
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python
Package Index](https://py
The runner will use the `requirements.txt` file to install your additional
dependencies onto the remote workers.
**Important:** Remote workers will install all packages listed in the
`requirements.txt` file. Because of this, it's very important that you delete
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you
don't remove non-PyPI packages, the remote workers will fail when attempting to
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like
[pip-tools](https://github.com/jazzband/pip-tools) to compile the
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image
with all the dependencies that are needed for the pipeline instead of
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom
Container
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at
runtime and specify `--requirements_file` option, we recommend you to install
the dependencies from the `--requirements_file` when building your container
image. In this case, you would reduce the pipeline startup time and do not need
to pass `--requirements_file` option at runtime.
Review comment:
If you are using a custom container image, we recommend that you install
the dependencies from the `--requirements_file` directly into your image at
build time. In this case, you do not need to pass `--requirements_file` option
at runtime, which will reduce the pipeline startup time. Fore example:...
##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in
`/opt/apache/beam/third_
By default, no licenses/notices are added to the docker images.
+#### Build an existing container image to make it compatible with Apache Beam
Runners {#modify-existing-base-image}
+Beam offers a way to take a Beam container image and customize it. But if you
have an existing base image to be compatible with Apache Beam Runners, use a
[multi-stage
build](https://docs.docker.com/develop/develop-images/multistage-build/)
process to copy over the necessary artifacts from a default Apache Beam base
image and provide your custom container image.
Review comment:
```suggestion
Beam offers a way to take a Beam container image and customize it. But if
you have an existing base image that you need to make compatible with Apache
Beam Runners, use a [multi-stage
build](https://docs.docker.com/develop/develop-images/multistage-build/)
process to copy over the necessary artifacts from a default Apache Beam base
image.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]