[
https://issues.apache.org/jira/browse/BEAM-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot updated BEAM-12555:
---------------------------------
Labels: stale-P2 (was: )
> Revisit process of dependency staging in Beam Python
> ----------------------------------------------------
>
> Key: BEAM-12555
> URL: https://issues.apache.org/jira/browse/BEAM-12555
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Valentyn Tymofieiev
> Priority: P2
> Labels: stale-P2
>
> There are a few issues:
> 1) Including Beam itself in requirements.txt is causing unnecessary friction,
> and is suboptimal, because Beam takes care to stage itself to the workers,
> and Beam workers include Beam dependencies. This is not clear from
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/. Yet
> from a user's perspective including Beam into requirements.txt seems natural.
> 2) Staging sources of all dependencies mentioned in requirements.txt, and
> their transitive dependencies, in some cases involves a hidden package
> recompilation, initiated by pip. The reason is that pip cannot reliably
> identify dependencies of a package without recompiling a package in certain
> cases, see [1-3] for pointers. This increases time it takes to launch a Beam
> job, and may require additional software (such as linux packages with header
> libraries or gcc deps) to be available. This causes friction, confusion, is
> not obvious and beyond Beam's control.
> [1] https://github.com/pypa/pip/issues/8387
> [2] https://github.com/pypa/pip/issues/7995
> [3]
> https://discuss.python.org/t/pip-download-just-the-source-packages-no-building-no-metadata-etc/4651
--
This message was sent by Atlassian Jira
(v8.3.4#803005)