damccorm opened a new issue, #21073: URL: https://github.com/apache/beam/issues/21073
There are a few issues: 1) Including Beam itself in requirements.txt is causing unnecessary friction, and is suboptimal, because Beam takes care to stage itself to the workers, and Beam workers include Beam dependencies. This is not clear from https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/. Yet from a user's perspective including Beam into requirements.txt seems natural. 2) Staging sources of all dependencies mentioned in requirements.txt, and their transitive dependencies, in some cases involves a hidden package recompilation, initiated by pip. The reason is that pip cannot reliably identify dependencies of a package without recompiling a package in certain cases, see [1-3] for pointers. This increases time it takes to launch a Beam job, and may require additional software (such as linux packages with header libraries or gcc deps) to be available. This causes friction, confusion, is not obvious and beyond Beam's control. [1] https://github.com/pypa/pip/issues/8387 [2] https://github.com/pypa/pip/issues/7995 [3] https://discuss.python.org/t/pip-download-just-the-source-packages-no-building-no-metadata-etc/4651 Imported from Jira [BEAM-12555](https://issues.apache.org/jira/browse/BEAM-12555). Original Jira may contain additional context. Reported by: tvalentyn. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
