[Distutils] Re: Environment markers for GPU/CUDA availibility

Paul Moore Tue, 04 Sep 2018 03:11:51 -0700

On Tue, 4 Sep 2018 at 08:07, Ronald Oussoren via Distutils-SIG
<distutils-sig@python.org> wrote:
> On 4 Sep 2018, at 01:51, Nick Coghlan <ncogh...@gmail.com> wrote:
> On Mon., 3 Sep. 2018, 5:48 am Ronald Oussoren, <ronaldousso...@mac.com> wrote:
>>
>> What’s the problem with including GPU and non-GPU variants of code in a 
>> binary wheel other than the size of the wheel? I tend to prefer binaries 
>> that work “everywhere", even if that requires some more work in building 
>> binaries (such as including multiple variants of extensions to have 
>> optimised code for different CPU variants, such as SSE and non-SSE variants 
>> in the past).
>
> As far as I'm aware, binary artifact size *is* the problem. It's just that 
> once you're  automatically building and pushing an artifact (or an image 
> containing that artifact) to thousands or tens of thousands of managed 
> systems, the wasted bandwidth from pushing redundant implementations of the 
> same functionality becomes more of a concern than the convenience of being 
> able to use the same artifact across multiple platforms.
>


> Ok. I’m more used to much smaller deployments where I don’t always know up 
> front what the capabilities are of the system that the code will run on.
>
> And looking at tensorflow specifically the difference in size is very much 
> significant, the GPU variant is 5 times as large as the non-GPU variant (55MB 
> vs 255MB). That’s a good reason for not wanting to unconditionally ship both 
> variants.

(Excuse messed up quoting - clients seem to use such different
conventions for quoting these days, it's hard to manually fix things
up sometimes :-()

Without trying to minimise the impact of this issue, how niche is the
problem we're discussing here? At some point, we need to be careful
not to cram too much into tags - and ultimately tags are the only
mechanism pip uses to determine what wheel it's going to install
(currently, at least). If we were to switch to a scheme where
installers need to check more generalised metadata (which is only
available after you've downloaded the wheel and opened it up) then
that has a significant cost in terms of bandwidth. We cannot assume
that medadata is available without downloading the wheel, PEP 503
allows an index to expose Python-Requires (and could be extended to
allow other metadata) but that's optional, and does nothing for a case
like pip's `--find-links http://my.server/my/wheel/directory` which
allows a plain directory to be served over HTTP, and allows for no
metadata other than the filename.

There's very much an 80-20 question here, we need to avoid letting the
needs of the 20% of projects with unusual needs, complicate usage for
the 80%. On the other hand, of course, leaving the specialist cases
with no viable solution also isn't reasonable, so even if tags aren't
practical here, finding a solution that allows projects to ship
specialised binaries some other way would be good. Just as a
completely un-thought through suggestion, maybe we could have a
mechanism where a small "generic" wheel can include pointers to
specialised extra code that gets downloaded at install time?

Package X -> x-1.0-cp37_cp37m_win_amd64.whl (includes generic code)
    Metadata - Implementation links:
        If we have a GPU -> <link to an archive of code to be added to
the install>
        If we don't have a GPU -> <link to an alternative non-GPU archive>

There's obviously a lot of unanswered questions here, but maybe
something like this would be better than forcing everything into the
wheel tags?

Paul
--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/mm3/archives/list/distutils-sig@python.org/message/L52SL7QVO7CTIXVZL7OF265Z26RHF3NZ/

[Distutils] Re: Environment markers for GPU/CUDA availibility

Reply via email to