On 8 February 2017 at 19:14, Thomas Kluyver <[email protected]> wrote: > What I'm proposing differs in that it would need to download files from PyPI > - basically all of them, if we're thorough about it. I imagine that's going > to involve a lot of data transfer. Do we know what order of magnitude we're > talking about? Is it so large that we should be thinking of running the > scanner in the same data centre as the file storage?
Last time I asked Donald about doing things like this, he noted that a full mirror is ~215 GiB. That was a year or two ago so I assume the number has gone up since then, but it should still be in the same order of magnitude. >From an ecosystem resilience point of view, there's also a lot to be said for having copies of the full PyPI bulk artifact store in both AWS S3 (which is where the production PyPI data lives) and in Azure :) Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - [email protected] https://mail.python.org/mailman/listinfo/distutils-sig
