Great write-up! I actually find the async nature of HTTP (both versions) a compelling reason to switch to asyncio. For HTTP/1.1 this sounds mostly like it would make the implementation easier; for HTTP/2 it sounds like it would just be better for the user-side as well (if the user just wants one resource they can safely continue to use the synchronous HTTP/1.1 version of the API.)
On Fri, Jun 9, 2017 at 9:55 AM, Cory Benfield <c...@lukasa.co.uk> wrote: > > On 9 Jun 2017, at 17:28, Guido van Rossum <gu...@python.org> wrote: > > At least one of us is still confused. The one-event-loop-per-thread model > is supported in asyncio without passing the loop around explicitly. The > get_event_loop() implementation stores all its state in thread-locals > instance, so it returns the thread's event loop. (Because this is an > "advanced" model, you have to explicitly create the event loop with > new_event_loop() and make it the default loop for the thread with > set_event_loop().) > > > Aha, ok, so the confused one is me. I did not know this. =) That > definitely works a lot better. It admittedly works less well if someone is > doing their own custom event loop stuff, but that’s probably an acceptable > limitation up until the time that Python 2 goes quietly into the night. > > All in all, I'm a bit curious why you would need to use asyncio at all > when you've got a thread per request anyway. > > > Yeah, so this is a bit of a diversion from the original topic of this > thread but I think it’s an idea worth discussing in this space. I want to > reframe the question a bit if you don’t mind, so shout if you think I’m not > responding to quite what you were asking. In my understanding, the question > you’re implicitly asking is this: > > "If you have a thread-safe library today (that is, one that allows users > to do threaded I/O with appropriate resource pooling and management), why > move to a model built on asyncio?” > > There are many answers to this question that differ for different > libraries with different uses, but for HTTP libraries like urllib3 here are > our reasons. > > The first is that it turns out that even for HTTP/1.1 you need to write > something that amounts to a partial event loop to properly handle the > protocol. Good HTTP clients need to watch for responses while they’re > uploading body data because if a response arrives during that process body > upload should be terminated immediately. This is also required for sensibly > handling things like Expect: 100-continue, as well as spotting other > intermediate responses and connection teardowns sensibly and without > throwing exceptions. > > Today urllib3 does not do this, and it has caused us pain, so our v2 > branch includes a backport of the Python 3 selectors module and a > hand-written partially-complete event loop that only handles the specific > cases we need. This is an extra thing for us to debug and maintain, and > ultimately it’d be easier to just delegate the whole thing to event loops > written by others who promise to maintain them and make them efficient. > > The second answer is that I believe good asyncio support in libraries is a > vital part of the future of this language, and “good” asyncio support IMO > does as little as possible to block the main event loop. Running all of the > complex protocol parsing and state manipulation of the Requests stack on a > background thread is not cheap, and involves a lot of GIL swapping around. > We have found several bug reports complaining about using Requests with > largish-numbers of threads, indicating that our big stack of Python code > really does cause contention on the GIL if used heavily. In general, having > to defer to a thread to run *Python* code in asyncio is IMO a nasty > anti-pattern that should be avoided where possible. It is much less bad to > defer to a thread to then block on a syscall (e.g. to get an “async” > getaddrinfo), but doing so to run a big big stack of Python code is vastly > less pleasant for the main event loop. > > For this reason, we’d ideally treat asyncio as the first-class citizen and > retrofit on the threaded support, rather than the other way around. This > goes doubly so when you consider the other reasons for wanting to use > asyncio. > > The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a > *highly* concurrent protocol. Connections send a lot of control frames back > and forth that are invisible to the user working at the semantic HTTP level > but that nonetheless need relatively low-latency turnaround (e.g. PING > frames). It turns out that in the traditional synchronous HTTP model > urllib3 only gets access to the socket to do work when the user calls into > our code. If the user goes a “long” time without calling into urllib3, we > take a long time to process any data off the connection. In the best case > this causes latency spikes as we process all the data that queued up in the > socket. In the worst case, this causes us to lose connections we should > have been able to keep because we failed to respond to a PING frame in a > timely manner. > > My experience is that purely synchronous libraries handling HTTP/2 simply > cannot provide a positive user experience. HTTP/2 flat-out *requires* > either an event loop or a dedicated background thread, and in practice in > your dedicated background thread you’d also just end up writing an event > loop (see answer 1 again). For this reason, it is basically mandatory for > HTTP/2 support in Python to either use an event loop or to spawn out a > dedicated C thread that does not hold the GIL to do the I/O (as this thread > will be regularly woken up to handle I/O events). > > Hopefully this (admittedly horrifyingly long) response helps illuminate > why we’re interested in asyncio support. It should be noted that if we find > ourselves unable to get it in the short term we may simply resort to > offering an “async” API that involves us doing the rough equivalent of > running in a thread-pool executor, but I won’t be thrilled about it. ;) > > Cory > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/