This is an automated email from the ASF dual-hosted git repository.

tvalentyn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 00526dd3fa2 Document requirements cache behavior differences. (#30493)
00526dd3fa2 is described below

commit 00526dd3fa2a8fa9168cdb7f94c86af519f5c678
Author: tvalentyn <tvalen...@users.noreply.github.com>
AuthorDate: Fri Mar 8 20:15:04 2024 -0800

    Document requirements cache behavior differences. (#30493)
    
    * Revise documentation on managing dependencies.
    
    * Describe the staging of requirements cache behavior.
    
    * Document that setup_file option doesn't stage dependencies.
    
    * Apply suggestions from code review
    
    Co-authored-by: Rebecca Szper <98840847+rsz...@users.noreply.github.com>
    
    ---------
    
    Co-authored-by: Rebecca Szper <98840847+rsz...@users.noreply.github.com>
---
 .../sdks/python-pipeline-dependencies.md            | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git 
a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
 
b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
index 2261c92eef0..286e90a3705 100644
--- 
a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
+++ 
b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
@@ -46,11 +46,21 @@ To supply a requirements.txt file:
 
     The runner will use the `requirements.txt` file to install your additional 
dependencies onto the remote workers.
 
-> **NOTE**: An alternative to `pip freeze` is to use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile all the 
dependencies required for the pipeline from a `--requirements_file`, where only 
top-level dependencies are mentioned.
+> **NOTE**: As an alternative to `pip freeze`, use a library like 
[pip-tools](https://github.com/jazzband/pip-tools) to compile all of the 
dependencies required for the pipeline from a `requirements.in` file. In the 
`requirements.in` file, only the top-level dependencies are mentioned.
+
+When you supply the `--requirements_file` pipeline option,  during pipeline 
submission, Beam downloads
+the specified packages locally into a requirements cache directory,
+and then stages the requirements cache directory to the runner.
+At runtime, when available, Beam installs packages from the requirements cache.
+This mechanism makes it possible to stage the dependency packages to the runner
+at submission. At runtime, the runner workers might be able to install the
+packages from the cache without needing a connection to PyPI. To disable 
staging the
+requirements, use the `--requirements_cache=skip` pipeline option.
+For more information, see the [help descriptions of these pipeline 
options](https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/options/pipeline_options.html#SetupOptions).
 
 ## Custom Containers {#custom-containers}
 
-You can pass a 
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image 
with all the dependencies that are needed for the pipeline instead of 
`requirements.txt`. [Follow the instructions on how to run pipeline with Custom 
Container images](/documentation/runtime/environments/#running-pipelines).
+You can pass a 
[container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image 
with all the dependencies that are needed for the pipeline. [Follow the 
instructions the show how to run the pipeline with custom container 
images](/documentation/runtime/environments/#running-pipelines).
 
 1. If you are using a custom container image, we recommend that you install 
the dependencies from the `--requirements_file` directly into your image at 
build time. In this case, you do not need to pass `--requirements_file` option 
at runtime, which will reduce the pipeline startup time.
 
@@ -118,7 +128,10 @@ Often, your pipeline code spans multiple files. To run 
your project remotely, yo
 
         --setup_file /path/to/setup.py
 
-**Note:** If you [created a requirements.txt file](#pypi-dependencies) and 
your project spans multiple files, you can get rid of the `requirements.txt` 
file and instead, add all packages contained in `requirements.txt` to the 
`install_requires` field of the setup call (in step 1).
+**Note:** It is not necessary to supply the `--requirements_file` 
[option](#pypi-dependencies) if the dependencies of your package are defined in 
the `install_requires` field of the `setup.py` file (see step 1).
+However unlike with the `--requirements_file` option, when you use the 
`--setup_file` option, Beam doesn't stage the dependent packages to the runner.
+Only the pipeline package is staged. If they aren't already provided in the 
runtime environment,
+the package dependencies are installed from PyPI at runtime.
 
 
 ## Non-Python Dependencies or PyPI Dependencies with Non-Python Dependencies 
{#nonpython}
@@ -132,7 +145,7 @@ Otherwise, you must perform the following steps.
 
     **Note:** You must verify that these commands run on the remote worker. 
For example, if you use `apt`, the remote worker needs `apt` support.
 
-4. Run your pipeline with the following command-line option:
+3. Run your pipeline with the following command-line option:
 
         --setup_file /path/to/setup.py
 

Reply via email to