Want to just cover a few more things I didn't in my reply to Aymeric.

On Fri, May 6, 2016 at 9:11 AM, Donald Stufft <don...@stufft.io> wrote:
>
>
> In short, I think that the message bus adds an additional layer of
> complexity
> that makes everything a bit more complex and complicated for very little
> actual
> gain over other possible, but less complex solutions. This message bus also
> removes a key part of the amount of control that the server which is
> *actually*
> receiving the connection has over the lifetime and process of the eventual
> request.
>

True; however, having a message bus/channel abstraction also removes a
layer of complexity that is caring about socket handling and sinking your
performance by even doing a slightly blocking operation.

In an ideal world we'd have some magical language that let us all write
amazing async code and that detected all possible deadlocks or livelocks
before they happened, but that's not yet the case, and I think the worker
model has been a good substitute for it in software design generally.


>
> For an example, in traditional HTTP servers where you have an open
> connection
> associated with whatever view code you're running whenever the client
> disconnects you're given a few options of what you can do, but the most
> common
> option in my experience is that once the connection has been lost the HTTP
> server cancels the execution of whatever view code it had been running [1].
> This allows a single process to serve more by shedding the load of
> connections
> that have since been disconnected for some reason, however in ASGI since
> there's no way to remove an item from the queue or cancel it once it has
> begun
> to be processed by a worker proccess you lose out on this ability to shed
> the
> load of processing a request once it has already been scheduled.
>

But as soon as you introduce a layer like Varnish into the equation, you've
lost this anyway, as you're no longer seeing the true client socket.
Abandoned requests are an existent problem with HTTP and WSGI; I see them
in our logs all the time.


>
> This additional complexity incurred by the message bus also ends up
> requiring
> additional complexity layered onto ASGI to try and re-invent some of the
> "natural" features of TCP and/or HTTP (or whatever the underlying protocol
> is).
> An example of this would be the ``order`` keyword in the WebSocket spec,
> something that isn't required and just naturally happens whenever you're
> directly connected to a websocket because the ``order`` is just whatever
> bytes
> come in off the wire. This also gets exposed in other features, like
> backpressure where ASGI didn't currently have a concept of allowing the
> queue
> to apply back pressure to the web connection but now Andrew has started to
> come
> around to the idea of adding a bounding to the queue (which is good!) but
> if
> the indirection of the message bus hadn't been added, then backpressure
> would
> have naturally occurred whenever you ended up getting enough things
> processing
> that it blocked new connections from being ``accept``d which would
> eventually
> end up filling up the backlog and then making new connections hang block
> waiting to connect. Now it's good that Andrew is adding the ability to
> bound
> the queue, but that is something that is going to require care to tune in
> each
> individual deployment (and will need regularly re-evaluated) rather than
> something that just occurs naturally as a consequence of the design of the
> system.
>

Client buffers in OSs were also manually tuned to begin with; I suspect we
can hone in on how to make this work best over time once we have more
experience with how it runs in the wild. I don't disagree that I'm
reinventing existing features of TCP sockets, but it's also a mix of UDP
features too; there's a reason a lot of modern protocols back onto UDP
instead of TCP, and I'm trying to strike the balance.


>
> Anytime you add a message bus you need to make a few trade offs, the
> particular
> trade off that ASGI made is that it should prefer "at most once" delivery
> of
> messages and low latency to guaranteed delivery. This choice is likely one
> of
> the sanest ones you can make in regards to which trade offs you make for
> the
> design of ASGI, but in that trade off you end up with new problems that
> don't
> exist otherwise. For example, HTTP/1 has the concept of pipelining which
> allows
> you to make several HTTP requests on a single HTTP connection without
> waiting
> for the responses before sending each one. Given the nature of ASGI it
> would be
> very difficult to actually support this feature without either violating
> the
> RFC or forcing either Daphne or the queue to buffer potentially huge
> responses
> while it waits for another request that came before it to be finished
> whereas
> again you get this for free using either async IO (you just don't await the
> result of that second request until the first request has been processed)
> or
> with WSGI if you're using generators (you just don't iterate over the
> result
> until you're ready for it).
>

Even with asyncio that data has to be buffered somewhere, whether it's in
the client transmit buffer, the receiving OS buffer, or Python memory. If
Daphne refuses to read() more from a socket it got a HTTP/1.1 pipeline
request on before the response for the first one comes back, that would
achieve the same affect as asyncio, no? (This may in fact be what it does
already, I need to check the twisted.web pipeline handling)


>
> ASGI purports to make it easier to gracefully restart your servers by
> making it
> possible to restart the worker servers (since there is no long live open
> connections to them) and simply spin up new ones. However, that's not
> really
> the whole story, because while that is true, it really only exists as long
> as
> your code changes don't touch something that Daphne needs to be aware of in
> order to process incoming requests. As soon as Daphne needs restarted then
> you're back in the same boat of needing another solution to graceful
> restarts
> and since Daphne depends on project specific code, it's going to require
> to be
> restarted much more frequently than other solutions that don't. It appears
> to
> me like it would be difficult to be able to automatically determine
> whether or
> not Daphne needs a restart on any particular deployment, so it will be
> common
> for people to just need to restart the whole stack anyways.
>

Daphne only depends on one tiny piece of project code, the channel layer
configuration. I don't imagine that changing nearly as often as actual
business logic. You're right that once there's a new Daphne version or that
config changes, it needs a restart too, but that's not going to be very
common.


>
> So what sort of solution would I personally advocate had I the time or
> energy
> to do so? I would look towards what sort of pure Python API (like WSGI
> itself)
> could be added to allow a web server to pass websockets down into Django. I
> admit that in some cases people would then need to layer on their own
> message
> buses (since that's just about the only reasonable way to implement
> something
> like Group().send()) but even here they'd be able to get added gains and
> "for
> free" features by utilizing something that specializes in this sort of
> multicast type of message (a pub/sub message bus more or less). Of course
> currently no web servers would support whatever this new "WSGI but for
> WebSockets" would be, so you'd need to implement something like Daphne that
> could handle it in the intrim (or possibly forever if nobody implemented
> it)
> but that's the same case as with ASGI now.
>
> Handling scaling out to multiple processes and graceful restarts would be
> handled the way they are today. Either you'd have some master process that
> isn't specific to the Django code (like Daphne is) that would spin up new
> processes, start sending traffic to them and then close out the old
> processes.
> This generalizes out past a single machine too, where you'd have something
> like
> HAProxy load balancing between machines and able to gracefully stop sending
> requests to once instance and start sending them to a new instance. For
> Websockets anytime you have a persistent connection to your worker you'll
> need
> some way trigger your clients to disconnect and reconnect (so they get
> scheduled onto the new server/process), but that's something you'll need
> with
> ASGI anyways anytime you need to restart Daphne anyways (and since the
> thing
> intiating the restart there is tied to your application code, a hook can be
> provided that gets called on shut down that lets the application do some
> application specific thing to tell people to reconnect).
>
> In this solution, since everything is just HTTP (or Websockets, or
> whatever)
> all the way down you end up getting to reuse all of the battle tested
> pieces
> that already exist like HAProxy. It's also easier to simply drop in another
> piece, possibly written in another language or another technology since
> everywhere in the stack speaks HTTP/Websocket and you don't have to go and
> teach say, Erlang how to ASGI.
>
>
I agree with the want to use things like HAProxy in the stack, but I think
your idea of handling WebSockets natively in Django is far more difficult
and fragile than Channels is, mostly due to our ten-year history of
synchronous code. We would have to audit a large amount of the codebase to
ensure it was all async compatible, not to mention drop python 2 suport,
before we'd even get close.

I'm not saying my solution is perfect, I'm saying it's pragmatic given our
current position and likely future position. Channels adds a spectrum to
Django where you can run it on anything between a single process, a single
machine (with the IPC channel layer), or a cluster of machines.

I look forward to Python async being in a better place in five to ten years
so we can revisit this and improve things (but hopefully keep a similar
end-developer API, which I think is quite nice to use and reflects URL
routing and view writing in a nice way), but I believe we need something
that works well now, which means taking a few tradeoffs along the way;
after all, it's not going to be forced on anyone, WSGI will still be there
for a long time to come*.

(*At least until I get around to working out what an in-process asyncio
WSGI replacement with WebSocket support might look like)

Andrew

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFwN1urfvxwUsGSsk3UHLMqZwrqTYfaCvgFQqfFqM%2BiGtkRUmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to