On 2018-09-04 11:40:17 -0500 (-0500), Dustin Ingram wrote:
> On Tue, Sep 4, 2018 at 11:33 AM Jeremy Stanley <fu...@yuggoth.org> wrote:
> >
> > Yes. If you haven't tried running a mirror of PyPI lately you're
> > likely not to have noticed, but the various nightly builds for
> > tensorflow seem to be the majority of the data on PyPI now. I'm sure
> > it's a very neat and useful tool, but we basically had to switch
> > from mirroring PyPI in our CI system to using a caching proxy
> > because of this.
> 
> Side note: PyPI now provides a list of the largest packages by total
> filesize: https://pypi.org/stats/
> 
> Depending on what mirror you're using, you may be able to exclude
> these packages from your mirror if you don't need them, e.g. for
> bandersnatch: 
> https://github.com/pypa/bandersnatch/blob/master/docs/filtering_configuration.md#blacklist-filtering-settings

We played whack-a-mole blacklisting some of the largest offenders in
our bandersnatch config for a while, but really needed to rebuild
the mirror from scratch since there's no easy way to go back and
delete the now-blacklisted packages from before the blacklist
entries were added (and that's a week+ effort to bootstrap a new
mirror these days). In the end we just switched to a caching proxy
we already had on hand because it got us most of the benefit of
mirroring with a tiny fraction of the disk space, given we use fewer
than 1000 packaged Python library dependencies across our CI jobs
anyway.
-- 
Jeremy Stanley

Attachment: signature.asc
Description: PGP signature

--
Distutils-SIG mailing list -- distutils-sig@python.org
To unsubscribe send an email to distutils-sig-le...@python.org
https://mail.python.org/mm3/mailman3/lists/distutils-sig.python.org/
Message archived at 
https://mail.python.org/mm3/archives/list/distutils-sig@python.org/message/3SPP3O47YY7OO2UHADY6AA6PDJMKEFDS/

Reply via email to