tvalentyn commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r828795562



##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +133,20 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In the pre-building step, we install pipeline dependencies on the container 
image prior to the job submission. This would speed up the pipeline execution.\

Review comment:
       Something is missing here. Let's add an introductory sentence.
   
   In pipeline execution modes where a Beam runner launches SDK workers in 
Docker containers,  the additional pipeline dependencies (specified via 
`--requirements_file` and other runtime options) are installed into the 
containers at runtime. This can increase the worker startup time. However, it 
may be possible to pre-build the SDK containers and perform the dependency 
installation once before the workers start. 

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,16 @@ If your pipeline uses public packages from the [Python 
Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional 
dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the 
`requirements.txt` file. Because of this, it's very important that you delete 
non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you 
don't remove non-PyPI packages, the remote workers will fail when attempting to 
install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip freeze` is to use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile the all the 
dependencies required for the pipeline from a `--requirements_file`, where only 
top-level dependencies are mentioned.
+## Custom Containers {#custom-containers}

Review comment:
       may need an extra newline before line 49

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +133,20 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In the pre-building step, we install pipeline dependencies on the container 
image prior to the job submission. This would speed up the pipeline execution.\
+To use pre-building the dependencies from `requirements.txt` on the container 
image. Follow the steps below.

Review comment:
       ```suggestion
   To pre-build the container image before the pipeline submission, follow the 
steps below.
   ```

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +136,19 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In the pre-building step, we install pipeline dependencies on the container 
image prior to the job submission. This would speed up the pipeline execution.\
+To use pre-building the dependencies from `requirements.txt` on the container 
image. Follow the steps below.
+1. Provide the container engine. We support `docker` and 
`cloud_build`(requires a GCP project with Cloud Build API enabled).
+
+       --prebuild_sdk_container_enginer <execution_environment>
+2. To pass a base image for pre-building dependencies, enable this flag. If 
not, apache beam's base image would be used.

Review comment:
       I would remove #2 now that we don't need a special flag.

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +133,20 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In the pre-building step, we install pipeline dependencies on the container 
image prior to the job submission. This would speed up the pipeline execution.\
+To use pre-building the dependencies from `requirements.txt` on the container 
image. Follow the steps below.
+1. Provide the container engine. We support `local_docker` and 
`cloud_build`(requires a GCP project with Cloud Build API enabled).
+
+       --prebuild_sdk_container_engine <execution_environment>
+2. To pass a base image for pre-building dependencies, enable this flag. If 
not, apache beam's base image would be used.
+
+       --sdk_container_image <location_to_base_image>
+3. To push the container image, pre-built locally with `local_docker` , to a 
remote repository(eg: docker registry), provide URL to the remote registry by 
passing

Review comment:
       If using `local_docker` engine, provide a URL for the remote registry to 
which the image will be pushed by passing...

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +133,20 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In the pre-building step, we install pipeline dependencies on the container 
image prior to the job submission. This would speed up the pipeline execution.\
+To use pre-building the dependencies from `requirements.txt` on the container 
image. Follow the steps below.
+1. Provide the container engine. We support `local_docker` and 
`cloud_build`(requires a GCP project with Cloud Build API enabled).
+
+       --prebuild_sdk_container_engine <execution_environment>
+2. To pass a base image for pre-building dependencies, enable this flag. If 
not, apache beam's base image would be used.
+
+       --sdk_container_image <location_to_base_image>
+3. To push the container image, pre-built locally with `local_docker` , to a 
remote repository(eg: docker registry), provide URL to the remote registry by 
passing
+
+       --docker_registry_push_url <IMAGE_URL>

Review comment:
       I am confused - what is a sample value of this param? Is it supposed to 
be the image name+tag or just the registry?

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +133,20 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In the pre-building step, we install pipeline dependencies on the container 
image prior to the job submission. This would speed up the pipeline execution.\
+To use pre-building the dependencies from `requirements.txt` on the container 
image. Follow the steps below.
+1. Provide the container engine. We support `local_docker` and 
`cloud_build`(requires a GCP project with Cloud Build API enabled).

Review comment:
       ```suggestion
   1. Provide the container engine. We support `local_docker` (requires local 
installation of Docker) and `cloud_build`(requires a GCP project with Cloud 
Build API enabled).
   ```

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +133,20 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In the pre-building step, we install pipeline dependencies on the container 
image prior to the job submission. This would speed up the pipeline execution.\
+To use pre-building the dependencies from `requirements.txt` on the container 
image. Follow the steps below.
+1. Provide the container engine. We support `local_docker` and 
`cloud_build`(requires a GCP project with Cloud Build API enabled).
+
+       --prebuild_sdk_container_engine <execution_environment>

Review comment:
       ```suggestion
          --prebuild_sdk_container_engine <container_engine>
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to