tvalentyn commented on code in PR #36249:
URL: https://github.com/apache/beam/pull/36249#discussion_r2414842728


##########
sdks/python/apache_beam/runners/portability/stager.py:
##########
@@ -780,7 +785,12 @@ def _populate_requirements_cache(
             platform_tag
         ])
       _LOGGER.info('Executing command: %s', cmd_args)
-      processes.check_output(cmd_args, stderr=processes.STDOUT)
+      output = processes.check_output(cmd_args, stderr=subprocess.STDOUT)
+      downloaded_packages = []
+      for line in output.decode('utf-8').split('\n'):

Review Comment:
   Re how to improve the logic, I looked at the discussion we had on this topic:
   
   https://lists.apache.org/thread/pqc2yl15kjdpxfp3pnocrrhkk3m6gsmh
   
   and there are couple of ideas:
   
   1) Parse log output to infer dependencies that were downloaded and also 
already existent in the cache (likely this will be brittle)
   2) Download twice 
(https://lists.apache.org/thread/v35bgj67hqrwl4ldymo8bqkybgt3z096), something 
like the following (haven't tested):
   
   ```
   pip download --dest /tmp/dataflow_requirements_cache -r requirements.txt 
--exists-action i --no-deps
   
   pip download --dest /tmp/temporary_folder_that_will_be_cleaned_up -r 
requirements.txt --find-links /tmp/dataflow_requirements_cache
   ```
   
   
   then, stage deps from temporary_folder_that_will_be_cleaned_up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to