tvalentyn commented on code in PR #36249:
URL: https://github.com/apache/beam/pull/36249#discussion_r2414842728
##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -780,7 +785,12 @@ def _populate_requirements_cache(
platform_tag
])
_LOGGER.info('Executing command: %s', cmd_args)
- processes.check_output(cmd_args, stderr=processes.STDOUT)
+ output = processes.check_output(cmd_args, stderr=subprocess.STDOUT)
+ downloaded_packages = []
+ for line in output.decode('utf-8').split('\n'):
Review Comment:
Re how to improve the logic, as per your question.
I looked at the discussion we had on this topic:
https://lists.apache.org/thread/pqc2yl15kjdpxfp3pnocrrhkk3m6gsmh
and there are couple of ideas:
1) Parse log output to infer dependencies that were downloaded and also
already existent in the cache (likely this will be brittle)
2) Download twice
(https://lists.apache.org/thread/v35bgj67hqrwl4ldymo8bqkybgt3z096), something
like the following (haven't tested):
```
pip download --dest /tmp/dataflow_requirements_cache -r requirements.txt
--exists-action i --no-deps
pip download --dest /tmp/temporary_folder_that_will_be_cleaned_up -r
requirements.txt --find-links /tmp/dataflow_requirements_cache
```
then, stage deps from temporary_folder_that_will_be_cleaned_up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]