On Thu, May 5, 2016 at 12:34 PM, Mark Lavin <markdla...@gmail.com> wrote:
> After somewhat hijacking another thread > https://groups.google.com/d/msg/django-developers/t_zuh9ucSP4/eJ4TlEDMCAAJ > I thought it was best to start fresh and clearly spell out my feelings > about the Channels proposal. To start, this discussion of “Django needs a > websocket story” reminds me very much of the discussions about NoSQL > support. There were proof of concepts made and the sky is falling arguments > about how Django would fail without MongoDB support. But in the end the > community concluded that `pip install pymongo` was the correct way to > integrate MongoDB into a Django project. In that same way, it has been > possible for quite some time to incorporate websockets into a Django > project by running a separate server dedicated for handling those > connections in a framework such as Twisted, Tornado, Aiohttp, etc and > establishing a clear means by which the two servers communicate with one > another as needed by the application. Now this is quite vague and ad-hoc > but it does work. To me this is the measuring stick by which to judge > Channels. In what ways is it better or worse than running a separate server > process for long-lived vs short-lived HTTP connections? > The main gains are (in my opinion): - The same server process can serve both HTTP and WebSockets without path prefixing (auto-negotiation based on the Upgrade header); without this you need an extra web layer in front to route requests to the right backend server - HTTP long-polling is supported via the same mechanism (like WebSockets, it does not fit inside the WSGI paradigm in a performant way) - You get to run less processes overall That said, I don't see everyone running over to use Daphne in production, which is why it's entirely reasonable to run two servers; one for HTTP and one for WebSockets. Channels fully supports this, whether you run the HTTP servers as self-contained WSGI servers or make them forward onto the ASGI layer via the adapter. > > At the application development level, Channels has the advantage of a > clearly defined interprocess communication which would otherwise need to be > written. However, The Channel API is built more around a simple queue/list > rather than a full messaging layer. The choices of backends are currently > limited to in-memory (not suitable for production), the ORM DB (not > suitable for production), and Redis. While Redis PUB/SUB is nice for > fanout/broadcast messaging, it isn’t a proper message queue. It also > doesn’t support TLS out of the box. For groups/broadcast the Redis Channel > backend also doesn’t use PUB/SUB but instead emulates that feature. It > likely can’t use PUB/SUB due to the choice of sharding. This seemingly > ignores robust existing solutions like Kombu, which is designed around AMQP > concepts. Kombu supports far more transports than the Channel backends > while emulating the same features, such as groups/fanout, and more such as > topic exchanges, QoS, message acknowledgement, compression, and additional > serialization formats. > Firstly, nothing in channels uses pub/sub - channels deliver to a single reader of a queue, and thus cannot be built on a broadcast solution like pub/sub. asgi_redis, the backend you're discussing, instead uses Redis lists containing the names of expiring Redis string keys with data encoded using msgpack, using LPOP or BLPOP to wait on the queue and get messages. It has built-in sharding support based on consistent hashing (and with separate handling for messages to and from workers). AMQP (or similar "full message queues") doesn't work with Channels for two main reasons: a) Running protocols through a queue like this requires incredibly low latency; the Redis solution is on the order of milliseconds, which is a speed I have personally not seen an AMQP queue reach b) The return channels for messages require delivery to a specific process, which is very difficult routing story given the AMQP design structure. There's some solutions, but at the end of the day you need to find a way to route dynamically-generated channel names to their correct interface servers where the channel names change with each client. There was some work to try and get a fourth, AMQP-based backend for channels a little while back, but it proved difficult as AMQP servers are much more oriented around not losing tasks and going a bit slower, while Channels is (and must be) designed the opposite way, closer almost to a socket protocol. > > Architecturally, both of these approaches require running two processes. > The current solution would run a WSGI server for short lived connections > and an async server for long lived connections. Channels runs a front-end > interface server, daphne, and the back-end worker servers. Which is more > scalable? That’s hard to say. They both scale the same way: add more > processes. > I'd like to point out again that you can still run two servers with Channels if you like, and have it work as you describe, just with a standardised interprotocol communication format. > It’s my experience that handling long-lived vs short-lived HTTP > connections have different scaling needs so it is helpful to be able to > scale them independently as one might do without Channels. That distinction > can’t be made with Channels since all HTTP connections are handled by the > interface servers. > Very good point, and why I expect some deployments to have to run different server clusters for different types and configure them differently. > Channels has an explicit requirement of a backend/broker server which > requires its own resources. While not required in the separate server > setup, it’s likely that there is some kind of message broker between the > servers so at best we’ll call this a wash in terms of resources. However, > the same is not true for latency. Channels will handle the same short-lived > HTTP connections by serializing the request, putting it into the backend, > deserializing request, processing the response in the worker, serializing > the response, putting it into the backend, deserializing response, and > sending it to the client. This is a fair bit of extra work for no real gain > since there is no concept of priority or backpressure. > The backpressure point is accurate and I'm considering giving channels a configurable capacity and an exception they raise when full so there's more information for what workers should do. > This latency also exists for the websocket message handling. While > Channels may try to claim that it’s more resilient/fault tolerant because > of this messaging layer, it claims “at most once” delivery which means that > a message might never be delivered. I don’t think that claim has much > merit. As noted in previous discussions, sending all HTTP requests > unencrypted through the Channel backend (such as Redis) raises a number of > potential security/regulatory issues which have yet to be addressed. > The encryption story is very true; we have problems using Redis here at Eventbrite for the same reasons. asgi_redis will soon get support for message encryption both on the wire and at rest in Redis, based on a symmetric key, but that's still likely not sufficient for a full enterprise deployment, where TLS tunnels are likely needed. > > One key difference to me is that pushing Channels as the new Django > standard makes Django’s default deployment story much more complicated. > Currently this complication is the exception not the rule. Deployment is a > frequent complaint, not just from people new to Django. Deployment of > Python apps is a pain and this requires running two of them even if you > aren’t using websockets. To me that is a huge step in the wrong direction > for Django in terms of ease of deployment and required system resources. > Channels doesn't change the default deployment story, and indeed you can just keep deploying as you do today; you only need to change deployment if you need websockets or long-polling, which was true in the past anyway; at least now it will be more standardised. > > Channels claims to have a better zero-downtime deployment story. However, > in practice I’m nTot convinced that will be true. A form of graceful reload > is supported by the most popular WSGI servers so it isn’t really better > than what we currently have. The Channel docs note that you only need to > restart the workers when deploying new code so you won’t drop HTTP > connections. But the interface application definition and the worker code > live in the same code base. It will be difficult to determine whether or > not you need to restart the interface or not on a given deployment so many > people will likely error on the side of restarting the interface as well. > True, we could do better documentation around this, but the interface servers would only ever need to be restarted if the one thing they import from the project (the channel layer configuration) changes. > With a separate async server, likely in a separate code base, it would be > easy to deploy them independently and only restart the websocket > connections when needed. > As mentioned before, you can deploy Daphne separately to just handle WebSockets if needed. > Also, it’s better if your application can handle gracefully > disconnections/reconnections for the websocket case anyway since you’ll > have to deal with that reality on mobile data connections and terrible wifi. > Agreed. This is why all my examples for channels use ReconnectingWebSocket. > > There is an idea floating around of using Channels for background > jobs/Celery replacement. It is not/should not be. The message delivery is > not guaranteed and there is no retry support. This is explicitly outside of > the stated design goals of the project. Allowing this idea to continue in > any form does a disservice to the Django community who may use Channels in > this way. It’s also a slap in the face to the Celery authors who’ve worked > for years to build a robust system which is superior to this naive > implementation. > I've always tried to be clear that it is not a Celery replacement but instead a way to offload some non-critical task if required. I also take slight offense at what seems like a personal attack; I have nothing but the greatest respect for the Celery authors and would definitely not "slap them in the face" or think that my solution was not "naive". > > So Channels is at best on par with the existing available approaches and > at worst adds a bunch of latency, potentially dropped messages, and new > points of failure while taking up more resources and locks everyone into > using Redis. It does provide a clear message framework but in my opinion > it’s too naive to be useful. Given the complexity in the space I don’t > trust anything built from the ground up without having a meaningful > production deployment to prove it out. It has taken Kombu many years to > mature and I don’t think it can be rewritten easily. > a) ASGI does not lock everyone into using Redis; it just so happens that is the first backend I have written. It is designed to run against other suitable datastores or socket protocols and we have the money to fund such an endeavour. b) Kombu solves a different problem - that of abstracting task queues - and it would still be my first choice for that; I have used it for many years and it would continue to be my choice for task queuing. ASGI is essentially meant to be an implementation of the CSP/Go style of message-passing interprocess communication, but cross-network rather than merely cross-thread or cross-process as I believe that network transparency makes for a much better deployment story and the ability to build a more resilient infrastructure. I would still expect people to run multiple "clusters" of interface servers, workers and ASGI channel layer servers together and load balance between them; it is not designed to be a single bus that an entire site runs on, but a way to overcome Python's limitations and make applications multi-threaded across a large number of CPU cores and machines. > > I see literally no advantage to pushing all HTTP requests and responses > through Redis. What this does enable is that you can continue to write > synchronous code. To me that’s based around some idea that async code is > too hard for the average Django dev to write or understand. Or that nothing > can be done to make parts of Django play nicer with existing async > frameworks which I also don’t believe is true. Python 3.4 makes writing > async Python pretty elegant and async/await in 3.5 makes that even better. > As someone who has been writing async code for the last six years, I like to only write async code when I actually, truly need it; I think specifically for the view- and event-oriented nature of Django code, having a solution like channels where code is run on worker threads fills 80% of people's needs and allows a lot less shooting of oneself in the foot (you can only stall a single "thread" when you do a blocking operation rather than an entire process of hundreds of "threads"). I would also like to see Django get more friendly to async natively in the core, but that is a much more difficult prospect and one we can only really tackle when we drop Python 2 support. I do believe, however, that the ASGI model maps well to writing an event-based single-process program with async support; in fact, I should probably go write an asyncio in-memory backend version to make sure there's no tweaks to make. > > Sorry this is so long. Those who saw the DjangoCon author’s panel know > that quickly writing walls of unreadable text is my forte. It’s been > building for a long time. I have an unsent draft to Andrew from when he > wrote his first blog post about this idea. I deeply regret not sending it > and beginning to engage in this discussion earlier. > I would have greatly appreciated that; from my perspective, this has been a long slog of refactoring, testing and refinement during which I've received some good feedback and acted on most of it (for example, changing some of the HTTP encoding spec, or how channel names are constructed and routed). > It’s hard for me to separate this work from the process by which it was > created. Russ touched on my previous experience with the DEP process and I > will admit that has jaded many of my interactions with the core team. > Building consensus is hard and I’m posting this to help work towards the > goal of community consensus. Thanks for taking the time to read this all > the way through and I welcome any feedback. > I will put my hand up and say that this sidestepped the DEP process, and that's entirely my fault. It was not my intention; I've been working on this for over two years, and only last year did I go public with my semi-final design and start asking for feedback; I should probably have taken it into a DEP then, but failed to. The problem is likely that I kept discussing channels with various members of the core team and other people I know in the Django community, and always received implicit approval, which is a terrible way to go about being transparent. That said, I hope that my efforts over the last year to publicise and talk about this in every available avenue have gone somewhat towards alleviating the lack of a DEP; I have never tried to smuggle this in or be quiet about it, in fact very much the contrary. I've had the ASGI spec (which I potentially would like to push as a PEP) up for a while now, too, and have been trying to actively get feedback on it from both the Django and the wider Python community. I hope we can resolve our differences on this and both walk away happy; you have some very valid points about deployment, reliability, and the newness of all this code, but I also believe that the path from here to having this deployed widely will be a good one. I have been working on this problem for a long time, and between experiments both by myself and internally at Eventbrite where our engineers tried a large number of different messaging backends for message transport (in our case, for a SOA layer, though it performs a similar function and requires similarly low latency), Redis seemed like the best choice for a first and canonical transport implementation. (AMQP, Kafka, enterprise message buses all have different problems). I don't expect people to adopt Channels overnight and switch to running Daphne in front of all their traffic; if anything, I expect a lot of people will run it just for WebSockets (I likely would at the moment if faced with a very large deployment). That said, I believe it is certainly at the point where it can be included in Django, if nothing else because the very design of channels and ASGI means that the interface servers and transport layer are both improveable and swappable out of the context of Django core. The patch to Django core is mostly routing and consumer design - an API I've tried hard to refine to make accessible for beginners while having flexibility for more advanced cases - and that's the only part that will be directly locked in stone for the future. The other components - interface servers and transport layers - exist outside the Django release cycle and have the potential for large improvement or complete replacement as the community starts using Channels and we start getting the feedback and communal knowledge that only the large deployment of this kind of thing can get. Sorry about circumventing the DEP process and pulling this off in a very strange way; it feels particularly guilty now it's been highlighted to me and I know that you are yourself working on a DEP, and it probably seems like I've abused my position on the core team to pull this off; please understand that was not my intention, and I've always wanted to have an open, frank discussion about channels in Django. In many ways, I'm glad someone has finally brought up all the things I thought would be valid counter-arguments but haven't really been advanced yet. Andrew -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at https://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFwN1uq5B6PLvk4s7dy-Kwip80qULg-g87sLb8VNggHMo1c1Kw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.