Re: My Take on Django Channels

Andrew Godwin Thu, 05 May 2016 13:20:13 -0700

On Thu, May 5, 2016 at 12:34 PM, Mark Lavin <markdla...@gmail.com> wrote:


> After somewhat hijacking another thread
> https://groups.google.com/d/msg/django-developers/t_zuh9ucSP4/eJ4TlEDMCAAJ
> I thought it was best to start fresh and clearly spell out my feelings
> about the Channels proposal. To start, this discussion of “Django needs a
> websocket story” reminds me very much of the discussions about NoSQL
> support. There were proof of concepts made and the sky is falling arguments
> about how Django would fail without MongoDB support. But in the end the
> community concluded that `pip install pymongo` was the correct way to
> integrate MongoDB into a Django project. In that same way, it has been
> possible for quite some time to incorporate websockets into a Django
> project by running a separate server dedicated for handling those
> connections in a framework such as Twisted, Tornado, Aiohttp, etc and
> establishing a clear means by which the two servers communicate with one
> another as needed by the application. Now this is quite vague and ad-hoc
> but it does work. To me this is the measuring stick by which to judge
> Channels. In what ways is it better or worse than running a separate server
> process for long-lived vs short-lived HTTP connections?
>

The main gains are (in my opinion):
 - The same server process can serve both HTTP and WebSockets without path
prefixing (auto-negotiation based on the Upgrade header); without this you
need an extra web layer in front to route requests to the right backend
server
 - HTTP long-polling is supported via the same mechanism (like WebSockets,
it does not fit inside the WSGI paradigm in a performant way)
 - You get to run less processes overall

That said, I don't see everyone running over to use Daphne in production,
which is why it's entirely reasonable to run two servers; one for HTTP and
one for WebSockets. Channels fully supports this, whether you run the HTTP
servers as self-contained WSGI servers or make them forward onto the ASGI
layer via the adapter.


>
> At the application development level, Channels has the advantage of a
> clearly defined interprocess communication which would otherwise need to be
> written. However, The Channel API is built more around a simple queue/list
> rather than a full messaging layer. The choices of backends are currently
> limited to in-memory (not suitable for production), the ORM DB (not
> suitable for production), and Redis. While Redis PUB/SUB is nice for
> fanout/broadcast messaging, it isn’t a proper message queue. It also
> doesn’t support TLS out of the box. For groups/broadcast the Redis Channel
> backend also doesn’t use PUB/SUB but instead emulates that feature. It
> likely can’t use PUB/SUB due to the choice of sharding. This seemingly
> ignores robust existing solutions like Kombu, which is designed around AMQP
> concepts. Kombu supports far more transports than the Channel backends
> while emulating the same features, such as groups/fanout, and more such as
> topic exchanges, QoS, message acknowledgement, compression, and additional
> serialization formats.
>

Firstly, nothing in channels uses pub/sub - channels deliver to a single
reader of a queue, and thus cannot be built on a broadcast solution like
pub/sub.

asgi_redis, the backend you're discussing, instead uses Redis lists
containing the names of expiring Redis string keys with data encoded using
msgpack, using LPOP or BLPOP to wait on the queue and get messages. It has
built-in sharding support based on consistent hashing (and with separate
handling for messages to and from workers).

AMQP (or similar "full message queues") doesn't work with Channels for two
main reasons:

 a) Running protocols through a queue like this requires incredibly low
latency; the Redis solution is on the order of milliseconds, which is a
speed I have personally not seen an AMQP queue reach

 b) The return channels for messages require delivery to a specific
process, which is very difficult routing story given the AMQP design
structure. There's some solutions, but at the end of the day you need to
find a way to route dynamically-generated channel names to their correct
interface servers where the channel names change with each client.

There was some work to try and get a fourth, AMQP-based backend for
channels a little while back, but it proved difficult as AMQP servers are
much more oriented around not losing tasks and going a bit slower, while
Channels is (and must be) designed the opposite way, closer almost to a
socket protocol.



>
> Architecturally, both of these approaches require running two processes.
> The current solution would run a WSGI server for short lived connections
> and an async server for long lived connections. Channels runs a front-end
> interface server, daphne, and the back-end worker servers. Which is more
> scalable? That’s hard to say. They both scale the same way: add more
> processes.
>

I'd like to point out again that you can still run two servers with
Channels if you like, and have it work as you describe, just with a
standardised interprotocol communication format.


> It’s my experience that handling long-lived vs short-lived HTTP
> connections have different scaling needs so it is helpful to be able to
> scale them independently as one might do without Channels. That distinction
> can’t be made with Channels since all HTTP connections are handled by the
> interface servers.
>

Very good point, and why I expect some deployments to have to run different
server clusters for different types and configure them differently.


> Channels has an explicit requirement of a backend/broker server which
> requires its own resources. While not required in the separate server
> setup, it’s likely that there is some kind of message broker between the
> servers so at best we’ll call this a wash in terms of resources. However,
> the same is not true for latency. Channels will handle the same short-lived
> HTTP connections by serializing the request, putting it into the backend,
> deserializing request, processing the response in the worker, serializing
> the response, putting it into the backend, deserializing response, and
> sending it to the client. This is a fair bit of extra work for no real gain
> since there is no concept of priority or backpressure.
>

The backpressure point is accurate and I'm considering giving channels a
configurable capacity and an exception they raise when full so there's more
information for what workers should do.


> This latency also exists for the websocket message handling. While
> Channels may try to claim that it’s more resilient/fault tolerant because
> of this messaging layer, it claims “at most once” delivery which means that
> a message might never be delivered. I don’t think that claim has much
> merit. As noted in previous discussions, sending all HTTP requests
> unencrypted through the Channel backend (such as Redis) raises a number of
> potential security/regulatory issues which have yet to be addressed.
>

The encryption story is very true; we have problems using Redis here at
Eventbrite for the same reasons. asgi_redis will soon get support for
message encryption both on the wire and at rest in Redis, based on a
symmetric key, but that's still likely not sufficient for a full enterprise
deployment, where TLS tunnels are likely needed.


>
> One key difference to me is that pushing Channels as the new Django
> standard makes Django’s default deployment story much more complicated.
> Currently this complication is the exception not the rule. Deployment is a
> frequent complaint, not just from people new to Django. Deployment of
> Python apps is a pain and this requires running two of them even if you
> aren’t using websockets. To me that is a huge step in the wrong direction
> for Django in terms of ease of deployment and required system resources.
>

Channels doesn't change the default deployment story, and indeed you can
just keep deploying as you do today; you only need to change deployment if
you need websockets or long-polling, which was true in the past anyway; at
least now it will be more standardised.


>
> Channels claims to have a better zero-downtime deployment story. However,
> in practice I’m nTot convinced that will be true. A form of graceful reload
> is supported by the most popular WSGI servers so it isn’t really better
> than what we currently have. The Channel docs note that you only need to
> restart the workers when deploying new code so you won’t drop HTTP
> connections. But the interface application definition and the worker code
> live in the same code base. It will be difficult to determine whether or
> not you need to restart the interface or not on a given deployment so many
> people will likely error on the side of restarting the interface as well.
>

True, we could do better documentation around this, but the interface
servers would only ever need to be restarted if the one thing they import
from the project (the channel layer configuration) changes.


> With a separate async server, likely in a separate code base, it would be
> easy to deploy them independently and only restart the websocket
> connections when needed.
>

As mentioned before, you can deploy Daphne separately to just handle
WebSockets if needed.


> Also, it’s better if your application can handle gracefully
> disconnections/reconnections for the websocket case anyway since you’ll
> have to deal with that reality on mobile data connections and terrible wifi.
>

Agreed. This is why all my examples for channels use ReconnectingWebSocket.


>
> There is an idea floating around of using Channels for background
> jobs/Celery replacement. It is not/should not be. The message delivery is
> not guaranteed and there is no retry support. This is explicitly outside of
> the stated design goals of the project. Allowing this idea to continue in
> any form does a disservice to the Django community who may use Channels in
> this way. It’s also a slap in the face to the Celery authors who’ve worked
> for years to build a robust system which is superior to this naive
> implementation.
>

I've always tried to be clear that it is not a Celery replacement but
instead a way to offload some non-critical task if required.

I also take slight offense at what seems like a personal attack; I have
nothing but the greatest respect for the Celery authors and would
definitely not "slap them in the face" or think that my solution was not
"naive".


>
> So Channels is at best on par with the existing available approaches and
> at worst adds a bunch of latency, potentially dropped messages, and new
> points of failure while taking up more resources and locks everyone into
> using Redis. It does provide a clear message framework but in my opinion
> it’s too naive to be useful. Given the complexity in the space I don’t
> trust anything built from the ground up without having a meaningful
> production deployment to prove it out. It has taken Kombu many years to
> mature and I don’t think it can be rewritten easily.
>

a) ASGI does not lock everyone into using Redis; it just so happens that is
the first backend I have written. It is designed to run against other
suitable datastores or socket protocols and we have the money to fund such
an endeavour.

b) Kombu solves a different problem - that of abstracting task queues - and
it would still be my first choice for that; I have used it for many years
and it would continue to be my choice for task queuing.

ASGI is essentially meant to be an implementation of the CSP/Go style of
message-passing interprocess communication, but cross-network rather than
merely cross-thread or cross-process as I believe that network transparency
makes for a much better deployment story and the ability to build a more
resilient infrastructure.

I would still expect people to run multiple "clusters" of interface
servers, workers and ASGI channel layer servers together and load balance
between them; it is not designed to be a single bus that an entire site
runs on, but a way to overcome Python's limitations and make applications
multi-threaded across a large number of CPU cores and machines.


>
> I see literally no advantage to pushing all HTTP requests and responses
> through Redis. What this does enable is that you can continue to write
> synchronous code. To me that’s based around some idea that async code is
> too hard for the average Django dev to write or understand. Or that nothing
> can be done to make parts of Django play nicer with existing async
> frameworks which I also don’t believe is true. Python 3.4 makes writing
> async Python pretty elegant and async/await in 3.5 makes that even better.
>

As someone who has been writing async code for the last six years, I like
to only write async code when I actually, truly need it; I think
specifically for the view- and event-oriented nature of Django code, having
a solution like channels where code is run on worker threads fills 80% of
people's needs and allows a lot less shooting of oneself in the foot (you
can only stall a single "thread" when you do a blocking operation rather
than an entire process of hundreds of "threads").

I would also like to see Django get more friendly to async natively in the
core, but that is a much more difficult prospect and one we can only really
tackle when we drop Python 2 support. I do believe, however, that the ASGI
model maps well to writing an event-based single-process program with async
support; in fact, I should probably go write an asyncio in-memory backend
version to make sure there's no tweaks to make.


>
> Sorry this is so long. Those who saw the DjangoCon author’s panel know
> that quickly writing walls of unreadable text is my forte. It’s been
> building for a long time. I have an unsent draft to Andrew from when he
> wrote his first blog post about this idea. I deeply regret not sending it
> and beginning to engage in this discussion earlier.
>

I would have greatly appreciated that; from my perspective, this has been a
long slog of refactoring, testing and refinement during which I've received
some good feedback and acted on most of it (for example, changing some of
the HTTP encoding spec, or how channel names are constructed and routed).


> It’s hard for me to separate this work from the process by which it was
> created. Russ touched on my previous experience with the DEP process and I
> will admit that has jaded many of my interactions with the core team.
> Building consensus is hard and I’m posting this to help work towards the
> goal of community consensus. Thanks for taking the time to read this all
> the way through and I welcome any feedback.
>

I will put my hand up and say that this sidestepped the DEP process, and
that's entirely my fault. It was not my intention; I've been working on
this for over two years, and only last year did I go public with my
semi-final design and start asking for feedback; I should probably have
taken it into a DEP then, but failed to.

The problem is likely that I kept discussing channels with various members
of the core team and other people I know in the Django community, and
always received implicit approval, which is a terrible way to go about
being transparent.

That said, I hope that my efforts over the last year to publicise and talk
about this in every available avenue have gone somewhat towards alleviating
the lack of a DEP; I have never tried to smuggle this in or be quiet about
it, in fact very much the contrary. I've had the ASGI spec (which I
potentially would like to push as a PEP) up for a while now, too, and have
been trying to actively get feedback on it from both the Django and the
wider Python community.

I hope we can resolve our differences on this and both walk away happy; you
have some very valid points about deployment, reliability, and the newness
of all this code, but I also believe that the path from here to having this
deployed widely will be a good one.

I have been working on this problem for a long time, and between
experiments both by myself and internally at Eventbrite where our engineers
tried a large number of different messaging backends for message transport
(in our case, for a SOA layer, though it performs a similar function and
requires similarly low latency), Redis seemed like the best choice for a
first and canonical transport implementation. (AMQP, Kafka, enterprise
message buses all have different problems).

I don't expect people to adopt Channels overnight and switch to running
Daphne in front of all their traffic; if anything, I expect a lot of people
will run it just for WebSockets (I likely would at the moment if faced with
a very large deployment). That said, I believe it is certainly at the point
where it can be included in Django, if nothing else because the very design
of channels and ASGI means that the interface servers and transport layer
are both improveable and swappable out of the context of Django core.

The patch to Django core is mostly routing and consumer design - an API
I've tried hard to refine to make accessible for beginners while having
flexibility for more advanced cases - and that's the only part that will be
directly locked in stone for the future. The other components - interface
servers and transport layers - exist outside the Django release cycle and
have the potential for large improvement or complete replacement as the
community starts using Channels and we start getting the feedback and
communal knowledge that only the large deployment of this kind of thing can
get.

Sorry about circumventing the DEP process and pulling this off in a very
strange way; it feels particularly guilty now it's been highlighted to me
and I know that you are yourself working on a DEP, and it probably seems
like I've abused my position on the core team to pull this off; please
understand that was not my intention, and I've always wanted to have an
open, frank discussion about channels in Django. In many ways, I'm glad
someone has finally brought up all the things I thought would be valid
counter-arguments but haven't really been advanced yet.

Andrew

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAFwN1uq5B6PLvk4s7dy-Kwip80qULg-g87sLb8VNggHMo1c1Kw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: My Take on Django Channels

Reply via email to