damccorm commented on code in PR #37360:
URL: https://github.com/apache/beam/pull/37360#discussion_r2728002113
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -741,20 +752,25 @@ def _populate_requirements_cache(
# the requirements file and will download package dependencies.
# The apache-beam dependency is excluded from requirements cache
population
- # because we stage the SDK separately.
+ # because we stage the SDK separately.
with tempfile.TemporaryDirectory() as temp_directory:
tmp_requirements_filepath = Stager._remove_dependency_from_requirements(
requirements_file=requirements_file,
dependency_to_remove='apache-beam',
temp_directory_path=temp_directory)
+ # Download to a temporary directory first, then copy to cache.
+ # This allows us to track exactly which packages are needed for this
+ # requirements file.
+ download_dir = tempfile.mkdtemp(dir=temp_directory)
+
cmd_args = [
Stager._get_python_executable(),
'-m',
'pip',
'download',
'--dest',
- cache_dir,
+ download_dir,
Review Comment:
With this change, won't we now have to download every dependency we need
every single time?
Basically, we're trading off a smaller staged cache for a higher per-job pip
download cost. I'm not sure that's a good tradeoff, since the more common use
case is probably usually to use more or less the same set of dependencies every
time.
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -245,12 +247,16 @@ def create_job_resources(
'such as --requirements_file. ')
if setup_options.requirements_cache != SKIP_REQUIREMENTS_CACHE:
- (
+ populate_cache_callable = (
populate_requirements_cache if populate_requirements_cache else
Review Comment:
The `populate_requirements_cache` argument is a callable that doesn't return
any results. So I think this doesn't work while respecting that arg.
We may be able to remove that argument entirely since I don't see it used
anywhere outside of `create_and_stage_job_resources`, and I don't see that
function used anymore, but we would want to do that before this change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]