[ 
https://issues.apache.org/jira/browse/BEAM-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-12555:
---------------------------------
    Labels: stale-P2  (was: )

> Revisit process of dependency staging in Beam Python
> ----------------------------------------------------
>
>                 Key: BEAM-12555
>                 URL: https://issues.apache.org/jira/browse/BEAM-12555
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Priority: P2
>              Labels: stale-P2
>
> There are a few issues:
> 1) Including Beam itself in requirements.txt is causing unnecessary friction, 
> and is suboptimal, because Beam takes care to stage itself to the workers, 
> and Beam workers include Beam dependencies. This is not clear from 
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/. Yet 
> from a user's perspective including Beam into requirements.txt seems natural. 
> 2) Staging sources of all dependencies mentioned in requirements.txt,  and 
> their transitive dependencies, in some cases involves a hidden package 
> recompilation, initiated by pip. The reason is that  pip  cannot reliably 
> identify dependencies of a package without recompiling a package in certain 
> cases, see [1-3] for pointers.  This increases time it takes to launch a Beam 
> job, and may require additional software (such as linux packages with header 
> libraries or gcc deps) to be available. This causes friction, confusion, is 
> not obvious and beyond Beam's control.
> [1] https://github.com/pypa/pip/issues/8387
> [2] https://github.com/pypa/pip/issues/7995
> [3] 
> https://discuss.python.org/t/pip-download-just-the-source-packages-no-building-no-metadata-etc/4651



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to