Re: [Async-sig] async/sync library reusage

Cory Benfield Fri, 09 Jun 2017 09:56:07 -0700

> On 9 Jun 2017, at 17:28, Guido van Rossum <[email protected]> wrote:
> 
> At least one of us is still confused. The one-event-loop-per-thread model is 
> supported in asyncio without passing the loop around explicitly. The 
> get_event_loop() implementation stores all its state in thread-locals 
> instance, so it returns the thread's event loop. (Because this is an 
> "advanced" model, you have to explicitly create the event loop with 
> new_event_loop() and make it the default loop for the thread with 
> set_event_loop().)


Aha, ok, so the confused one is me. I did not know this. =) That definitely 
works a lot better. It admittedly works less well if someone is doing their own 
custom event loop stuff, but that’s probably an acceptable limitation up until 
the time that Python 2 goes quietly into the night.

> All in all, I'm a bit curious why you would need to use asyncio at all when 
> you've got a thread per request anyway.

Yeah, so this is a bit of a diversion from the original topic of this thread 
but I think it’s an idea worth discussing in this space. I want to reframe the 
question a bit if you don’t mind, so shout if you think I’m not responding to 
quite what you were asking. In my understanding, the question you’re implicitly 
asking is this:

"If you have a thread-safe library today (that is, one that allows users to do 
threaded I/O with appropriate resource pooling and management), why move to a 
model built on asyncio?”

There are many answers to this question that differ for different libraries 
with different uses, but for HTTP libraries like urllib3 here are our reasons.

The first is that it turns out that even for HTTP/1.1 you need to write 
something that amounts to a partial event loop to properly handle the protocol. 
Good HTTP clients need to watch for responses while they’re uploading body data 
because if a response arrives during that process body upload should be 
terminated immediately. This is also required for sensibly handling things like 
Expect: 100-continue, as well as spotting other intermediate responses and 
connection teardowns sensibly and without throwing exceptions.

Today urllib3 does not do this, and it has caused us pain, so our v2 branch 
includes a backport of the Python 3 selectors module and a hand-written 
partially-complete event loop that only handles the specific cases we need. 
This is an extra thing for us to debug and maintain, and ultimately it’d be 
easier to just delegate the whole thing to event loops written by others who 
promise to maintain them and make them efficient.

The second answer is that I believe good asyncio support in libraries is a 
vital part of the future of this language, and “good” asyncio support IMO does 
as little as possible to block the main event loop. Running all of the complex 
protocol parsing and state manipulation of the Requests stack on a background 
thread is not cheap, and involves a lot of GIL swapping around. We have found 
several bug reports complaining about using Requests with largish-numbers of 
threads, indicating that our big stack of Python code really does cause 
contention on the GIL if used heavily. In general, having to defer to a thread 
to run *Python* code in asyncio is IMO a nasty anti-pattern that should be 
avoided where possible. It is much less bad to defer to a thread to then block 
on a syscall (e.g. to get an “async” getaddrinfo), but doing so to run a big 
big stack of Python code is vastly less pleasant for the main event loop.

For this reason, we’d ideally treat asyncio as the first-class citizen and 
retrofit on the threaded support, rather than the other way around. This goes 
doubly so when you consider the other reasons for wanting to use asyncio.

The third answer is that HTTP/2 makes all of this much harder. HTTP/2 is a 
*highly* concurrent protocol. Connections send a lot of control frames back and 
forth that are invisible to the user working at the semantic HTTP level but 
that nonetheless need relatively low-latency turnaround (e.g. PING frames). It 
turns out that in the traditional synchronous HTTP model urllib3 only gets 
access to the socket to do work when the user calls into our code. If the user 
goes a “long” time without calling into urllib3, we take a long time to process 
any data off the connection. In the best case this causes latency spikes as we 
process all the data that queued up in the socket. In the worst case, this 
causes us to lose connections we should have been able to keep because we 
failed to respond to a PING frame in a timely manner.

My experience is that purely synchronous libraries handling HTTP/2 simply 
cannot provide a positive user experience. HTTP/2 flat-out *requires* either an 
event loop or a dedicated background thread, and in practice in your dedicated 
background thread you’d also just end up writing an event loop (see answer 1 
again). For this reason, it is basically mandatory for HTTP/2 support in Python 
to either use an event loop or to spawn out a dedicated C thread that does not 
hold the GIL to do the I/O (as this thread will be regularly woken up to handle 
I/O events).

Hopefully this (admittedly horrifyingly long) response helps illuminate why 
we’re interested in asyncio support. It should be noted that if we find 
ourselves unable to get it in the short term we may simply resort to offering 
an “async” API that involves us doing the rough equivalent of running in a 
thread-pool executor, but I won’t be thrilled about it. ;)

Cory

_______________________________________________
Async-sig mailing list
[email protected]
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] async/sync library reusage

Reply via email to