On Tue, Sep 27, 2011 at 1:25 PM, Tarek Ziadé <[email protected]> wrote: > On Tue, Sep 27, 2011 at 5:35 PM, Jim Fulton <[email protected]> wrote:
... >> But I don't want to have to update buildout *just* because of an itch >> to have a custom protocol. > > I kind of wonder how hard it would be to have a standalone pypi > download client, ripped off from python 3.3's packaging, so you would > not have to worry about this. I doubt I'm going to be able to avoid worrying about it. Still a reference client implementation would be useful. > And, well, you do not sound like you want to spend time in these > matters in any case, I don't know what you mean. Not sure I care. :) > so if someone brings a patch I hope you will not > refuse it. No. I'll eventually implement it if no one else does. >>> But the use case is usually: PyPI is down, we fallback to a mirror. I >>> don't think it's more complicated than this. >> >> I don't agree. On multiple levels. PYPI is often up but slow. > > That's an orthogonal issue : any server can be slow. A service can be fast even if an individual server is slow. Also, CDNs can make lots of horsepower available that is shared among multiple customers. I really doubt that anything we build will be faster. > One better way to drastically speed up buildout is to download / > build stuff in parallel imo. Thats true and something I'd like to do at some point. That's one of the reasons I expect I'll have to worry about the protocol. > >> It's also in the wrong place. A CDN should provide better performance, >> reliability and locality. > > Locality is indeed important, and picking up the nearest server is great. > Reliability is also solved by the mirrors. At the expense of increased complexity on the client. >> >> A client has to: >> >> - try pypi >> - fallback to "last" >> - If that's down, decide what other indexes to check >> >> I don't see how having timestamps help unless you know >> what the current timestamp is, unless you say that you'll reject >> a mirror with a timestamp more than some period in the past. > > How hard it is to make those decisions ? It's not "hard" conceptually, but it's still a lot of implementation complexity and a lot of extra network requests. > Do you really think getting the current timestamp is that hard ? > > And the mirror timestamp, > > http://b.pypi.python.org/last-modified > > In all you've said I fail to see how complicated it is, or long to do. That's an extra HTTP request I need to make when I'm considering use of a mirror. If the first mirror I check seems to be out of date, I may need to check all the mirrors. It's an open question what should be considered potentially out of date, a timestamp older than an hour? a day? > The ordering I see is: > > normal behavior: > - if the cache is too old: How old is too old? > get the list of mirrors (-> the list of > mirrors and their timestamps get cached) They'll only get cached for the program invocation. This means I have to potentially check lots of mirrors every time someone runs buildout. I can reduce latency by doing this in parallel, but that's still a lot of requests. > - pick the closest one How do I decide what's closest? Did you mean closest? or most up to date > - use it > > the server times out: > - try the "next closest" > > >> It's not clear what this time delta should be and, in any case, >> the client needs to first validate a mirror by checking it's timestamp. > > This is the job of the client yes. An option that says, discard > mirrors that are > 1 day, or 5 hours etc. "etc" is just waving hands. Selecting the right value is hard, possibly application dependent. Is this a configuration variable? Now the user has something to deal with. > Keeping a local cache that gets updated eventually is sufficient. In process, or on disk? This just gets better and better. :) >> I think this protocol is going to be hard to get right. > > Maybe ? but if a v1 allows us to switch from server 1 being down to > server 2, it's already a success, no ? > > servers that *we* the community, manage. I fail to see why this is inherently a good thing. I don't like "managing" things. Less work is good. ... > Do we really want Amazon to handle PyPI ? Yes, or Rackspace, or Google, or AOL, or, whatever. Just not us. (I suspect some of these might even do it for free.) > I prefer a bunch of community mirrors. Heck, I have one at Mozilla, > and might make it public one day :) > > Or maybe the optimal solution is our own CND proxy so we don't deal > with this on client side. > > <music in the background with trumpets, a flag with the Python logo > raises, slowly> Uh, yeah, sure. FWIW, it hadn't occurred to me to use a CDN until a conversation a few days ago. Doh. Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton _______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
