On 8 February 2017 at 19:14, Thomas Kluyver <[email protected]> wrote:
> What I'm proposing differs in that it would need to download files from PyPI
> - basically all of them, if we're thorough about it. I imagine that's going
> to involve a lot of data transfer. Do we know what order of magnitude we're
> talking about? Is it so large that we should be thinking of running the
> scanner in the same data centre as the file storage?

Last time I asked Donald about doing things like this, he noted that a
full mirror is ~215 GiB. That was a year or two ago so I assume the
number has gone up since then, but it should still be in the same
order of magnitude.

>From an ecosystem resilience point of view, there's also a lot to be
said for having copies of the full PyPI bulk artifact store in both
AWS S3 (which is where the production PyPI data lives) and in Azure :)

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
_______________________________________________
Distutils-SIG maillist  -  [email protected]
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to