[ 
https://issues.apache.org/jira/browse/BEAM-13314?focusedWorklogId=745960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-745960
 ]

ASF GitHub Bot logged work on BEAM-13314:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 22/Mar/22 17:30
            Start Date: 22/Mar/22 17:30
    Worklog Time Spent: 10m 
      Work Description: tvalentyn commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r832426684



##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +134,25 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In pipeline execution modes where a Beam runner launches SDK workers in Docker 
containers, the additional pipeline dependencies (specified via 
`--requirements_file` and other runtime options) are installed into the 
containers at runtime. This can increase the worker startup time.
+ However, it may be possible to pre-build the SDK containers and perform the 
dependency installation once before the workers start. To pre-build the 
container image before pipeline submission, provide the pipeline options 
mentioned below.
+1. Provide the container engine. Beam supports `local_docker`(requires local 
installation of Docker) and `cloud_build`(requires a GCP project with Cloud 
Build API enabled).
+
+       --prebuild_sdk_container_engine=<container_engine>
+2. To pass a base image for pre-building dependencies, provide 
`--sdk_container_image`. If not, Apache beam's base 
[image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) would be used.
+
+       --sdk_container_image=<location_to_base_image>
+3. If using `local_docker` engine, provide a URL for the remote registry to 
which the image will be pushed by passing
+
+       --docker_registry_push_url=<remote_registry_url>
+       # Example: --docker_registry_push_url=<registry_name>/beam
+       # pre-built image will be pushed to the 
<registry_name>/beam/beam_python_prebuilt_sdk:<unique_image_tag>
+       # <unique_image_tag> tag is generated by Beam SDK.
+
+   **NOTE:** `docker_registry_push_url` must be a remote registry.
+> To use Docker, the `--sdk_container_image` should be compatible with Apache 
Beam Runner. Please follow the 
[instructions](https://beam.apache.org/documentation/runtime/environments/#building-and-pushing-custom-containers)
 on how to build a base container image compatible with Apache Beam.
+

Review comment:
       Suggestion to add to the notes:
   
   The pre-building feature requires the Apache Beam SDK for Python, version 
2.25.0 or later.
   
   The container images created during prebuilding will persist beyond the 
pipeline runtime.
   Once your job is finished or stopped, you can remove the pre-built image 
from the container registry.
   
   If your pipeline is using a custom container image, most likely you will not 
benefit from prebuilding step as extra dependencies can be preinstalled in the 
custom image at build time. If you still would like to use prebuilding with 
custom images, use Apache Beam SDK 2.38.0 or newer and supply your custom image 
in via the `--sdk_container_image` pipeline option. 
   

##########
File path: 
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -123,3 +134,25 @@ If your pipeline uses non-Python packages (e.g. packages 
that require installati
         --setup_file /path/to/setup.py
 
 **Note:** Because custom commands execute after the dependencies for your 
workflow are installed (by `pip`), you should omit the PyPI package dependency 
from the pipeline's `requirements.txt` file and from the `install_requires` 
parameter in the `setuptools.setup()` call of your `setup.py` file.
+
+## Pre-building SDK container image
+
+In pipeline execution modes where a Beam runner launches SDK workers in Docker 
containers, the additional pipeline dependencies (specified via 
`--requirements_file` and other runtime options) are installed into the 
containers at runtime. This can increase the worker startup time.
+ However, it may be possible to pre-build the SDK containers and perform the 
dependency installation once before the workers start. To pre-build the 
container image before pipeline submission, provide the pipeline options 
mentioned below.
+1. Provide the container engine. Beam supports `local_docker`(requires local 
installation of Docker) and `cloud_build`(requires a GCP project with Cloud 
Build API enabled).
+
+       --prebuild_sdk_container_engine=<container_engine>
+2. To pass a base image for pre-building dependencies, provide 
`--sdk_container_image`. If not, Apache beam's base 
[image](https://hub.docker.com/search?q=apache%2Fbeam&type=image) would be used.

Review comment:
       As discussed offline, let's remove this and line 156.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 745960)
    Time Spent: 7h 50m  (was: 7h 40m)

> Revise recommendations to manage Python pipeline dependencies. 
> ---------------------------------------------------------------
>
>                 Key: BEAM-13314
>                 URL: https://issues.apache.org/jira/browse/BEAM-13314
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core, website
>            Reporter: Valentyn Tymofieiev
>            Assignee: Anand Inguva
>            Priority: P2
>              Labels: usability
>          Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> The page  
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ 
> recommends managing Python dependencies via requirements files.
> This approach is currently inefficient in light of introduction and adoption 
> of PEP-517 by some packages, see: 
> https://lists.apache.org/thread/trljnxo39c0cmff790yff3h8n5okqt3q  and the 
> rest of the thread, and does not mention Custom Containers or SDK prebuilding 
> workflows.
>  
> We should revise it and document best practices.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to