damccorm opened a new issue, #21073:
URL: https://github.com/apache/beam/issues/21073

   There are a few issues:
   
   1) Including Beam itself in requirements.txt is causing unnecessary 
friction, and is suboptimal, because Beam takes care to stage itself to the 
workers, and Beam workers include Beam dependencies. This is not clear from 
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/. Yet 
from a user's perspective including Beam into requirements.txt seems natural. 
   
   2) Staging sources of all dependencies mentioned in requirements.txt,  and 
their transitive dependencies, in some cases involves a hidden package 
recompilation, initiated by pip. The reason is that  pip  cannot reliably 
identify dependencies of a package without recompiling a package in certain 
cases, see [1-3] for pointers.  This increases time it takes to launch a Beam 
job, and may require additional software (such as linux packages with header 
libraries or gcc deps) to be available. This causes friction, confusion, is not 
obvious and beyond Beam's control.
   
   [1] https://github.com/pypa/pip/issues/8387
   [2] https://github.com/pypa/pip/issues/7995
   [3] 
https://discuss.python.org/t/pip-download-just-the-source-packages-no-building-no-metadata-etc/4651
   
   Imported from Jira 
[BEAM-12555](https://issues.apache.org/jira/browse/BEAM-12555). Original Jira 
may contain additional context.
   Reported by: tvalentyn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to