tvalentyn commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r819900366



##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in 
`/opt/apache/beam/third_
 
 By default, no licenses/notices are added to the docker images.
 
+#### Build an existing container image to make it compatible with Apache Beam 
Runners {#modify-existing-base-image}
+Beam offers a way to take a Beam container image and customize it. But if you 
have an existing base image to be compatible with Apache Beam Runners, use a 
[multi-stage 
build](https://docs.docker.com/develop/develop-images/multistage-build/) 
process to copy over the necessary artifacts from a default Apache Beam base 
image and provide your custom container image.
+
+
+1. Copy necessary artifacts from Apache Beam base image to your image.
+  ```
+  # This can be any container image,
+ FROM python:3.8-slim

Review comment:
       mismatch between py3.8 and py3.7 below

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python 
Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional 
dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the 
`requirements.txt` file. Because of this, it's very important that you delete 
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you 
don't remove non-PyPI packages, the remote workers will fail when attempting to 
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile the 
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a 
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image 
with all the dependencies that are needed for the pipeline instead of 
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom 
Container 
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at 
runtime and specify `--requirements_file` option, we recommend you to install 
the dependencies from the `--requirements_file` when building your container 
image. In this case, you would reduce the pipeline startup time and do not need 
to pass `--requirements_file` option at runtime.
+
+       # Add these lines with the path to the requirements.txt to the 
Dockerfile
+
+       COPY <path to requirements.txt> /tmp/requirements.txt
+       RUN python -m pip download -r /tmp/requirements.txt
+
+**Note:** [Different 
approaches](https://beam.apache.org/documentation/runtime/environments/#writing-new-dockerfiles)
 to build the container images that would be compatible with Apache Beam 
Runners.

Review comment:
       I don't think this is relevant here

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python 
Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional 
dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the 
`requirements.txt` file. Because of this, it's very important that you delete 
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you 
don't remove non-PyPI packages, the remote workers will fail when attempting to 
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile the 
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a 
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image 
with all the dependencies that are needed for the pipeline instead of 
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom 
Container 
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at 
runtime and specify `--requirements_file` option, we recommend you to install 
the dependencies from the `--requirements_file` when building your container 
image. In this case, you would reduce the pipeline startup time and do not need 
to pass `--requirements_file` option at runtime.
+
+       # Add these lines with the path to the requirements.txt to the 
Dockerfile
+
+       COPY <path to requirements.txt> /tmp/requirements.txt
+       RUN python -m pip download -r /tmp/requirements.txt

Review comment:
       why pip download and not pip install ?

##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in 
`/opt/apache/beam/third_
 
 By default, no licenses/notices are added to the docker images.
 
+#### Build an existing container image to make it compatible with Apache Beam 
Runners {#modify-existing-base-image}

Review comment:
       @emilymye could you PTAL at this section?

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +136,19 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image

Review comment:
       @y1chi could you PTAL at this section?

##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -46,7 +46,7 @@ Beam [SDK container 
images](https://hub.docker.com/search?q=apache%2Fbeam&type=i
 
 1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on a released 
container image**. This is sufficient for simple additions to the image, such 
as adding artifacts or environment variables.
 2. **[Modifying](#modifying-dockerfiles) a source Dockerfile in 
[Beam](https://github.com/apache/beam)**. This method requires building from 
Beam source but allows for greater customization of the container (including 
replacement of artifacts or base OS/language versions).
-
+3. **[Build](#modify-existing-base-image) an existing container image to make 
it compatible with Apache Beam Runners**. This method is used when users start 
from an existing image, and configure the image to be compatible with Apache 
Beam Runners.

Review comment:
       ```suggestion
   3. **[Modifying](#modify-existing-base-image) an existing container image to 
make it compatible with Apache Beam Runners**. This method is used when users 
start from an existing image, and configure the image to be compatible with 
Apache Beam Runners.
   ```
   
   Also: one of three ways above.

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python 
Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional 
dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the 
`requirements.txt` file. Because of this, it's very important that you delete 
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you 
don't remove non-PyPI packages, the remote workers will fail when attempting to 
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile the 
`requirements.txt` with all the dependencies required for the pipeline.

Review comment:
       `pip freeze`, not `pip check`
   
   you can explain:
   
   `...to compile the `requirements.txt` all transitive dependencies from a 
smaller set of requirements.```

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python 
Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional 
dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the 
`requirements.txt` file. Because of this, it's very important that you delete 
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you 
don't remove non-PyPI packages, the remote workers will fail when attempting to 
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile the 
`requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a 
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image 
with all the dependencies that are needed for the pipeline instead of 
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom 
Container 
images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at 
runtime and specify `--requirements_file` option, we recommend you to install 
the dependencies from the `--requirements_file` when building your container 
image. In this case, you would reduce the pipeline startup time and do not need 
to pass `--requirements_file` option at runtime.

Review comment:
       If you are using a custom container image, we recommend that you install 
the dependencies from the `--requirements_file` directly into your image at 
build time. In this case, you do not need to pass `--requirements_file` option 
at runtime, which will reduce the pipeline startup time. Fore example:...

##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in 
`/opt/apache/beam/third_
 
 By default, no licenses/notices are added to the docker images.
 
+#### Build an existing container image to make it compatible with Apache Beam 
Runners {#modify-existing-base-image}
+Beam offers a way to take a Beam container image and customize it. But if you 
have an existing base image to be compatible with Apache Beam Runners, use a 
[multi-stage 
build](https://docs.docker.com/develop/develop-images/multistage-build/) 
process to copy over the necessary artifacts from a default Apache Beam base 
image and provide your custom container image.

Review comment:
       ```suggestion
   Beam offers a way to take a Beam container image and customize it. But if 
you have an existing base image that you need to make compatible with Apache 
Beam Runners, use a [multi-stage 
build](https://docs.docker.com/develop/develop-images/multistage-build/) 
process to copy over the necessary artifacts from a default Apache Beam base 
image.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to