On Thu, Dec 1, 2016 at 12:48 PM, Hank Sims <hanks...@gmail.com> wrote:
> Thanks, Andrew. A few follow-up questions:
> 1. How would one go about increasing the default maximum queue size? I saw
> some reference to this when I was researching the problem yesterday, but I
> couldn't find the setting that would change it.
You set it in the channel layer configuration in Django, like this:
> 2. Shouldn't there be a way to resolve the backpressure by draining the
> queue before allowing new messages to be written to it? It seems like
> cutting the connection between client and server would exacerbate the
> problem, rather than remedying it. In my particular case, it wouldn't be
> that big if a block of messages were skipped. But closing the socket when
> maximum queue size is reached seems to cause a cascade of problems.
How would you propose this worked? The only alternative to closing the
socket is to buffer the messages in memory and retry sending them, at which
point you might have the case where the client thinks they have a working
connection but it's not actually delivered anything for 30 seconds. Hard
failure is preferable in distributed systems in my experience; trying to
solve the problem with soft failure and retry just makes problems even more
difficult to detect and debug.
> On Thursday, December 1, 2016 at 12:34:22 PM UTC-8, Andrew Godwin wrote:
>> "Backpressure" is designed exactly for what you describe, which is when
>> clients are making requests of the server faster than you can handle them.
>> Each channel has a maximum capacity of messages (100 by default), beyond
>> which trying to add a new one results in an error.
>> Webservers, when they see this, return an error to the client to try and
>> resolve the overload situation. If they didn't, then the server would clog
>> up trying to buffer all the pending requests. It's like returning a 503
>> error on a webpage when a server is overloaded.
>> To solve the situation, just provision more workers so the channel is
>> drained as fast as messages are put onto it.
>> If you want to monitor the size of channels to anticipate this stuff,
>> there's a plan for an API in ASGI that would let you do that but it's not
>> in place yet. You may look at the length of the Redis lists directly in the
>> meantime if you wish (there's one list per channel).
>> On Thu, Dec 1, 2016 at 11:26 AM, hank...@gmail.com <hank...@gmail.com>
>>> Can someone help me understand the concept of websocket “backpressure”
>>> in a Django Channels project? What is it? How do diagnose it? At what level
>>> of the stack does it occur? How do I cure it? The docs are a little hazy on
>>> I wired up a quick Channels project for my mid-sized website. Before
>>> deploying the project, I load-tested it with thor
>>> <https://github.com/observing/thor> and started scaling up. When I
>>> reached two Daphne processes and four worker processes, it seemed like I
>>> had enough power behind the project to handle the load on my site. It was
>>> able to handle 2000 simultaneous websocket connections without errors,
>>> according to thor. That should have been more than enough.
>>> I deployed, and everything went fine for a while. After a bit, though,
>>> the websockets got slow and the server started to drop connections.
>>> Eventually the whole project stalled out. I looked through the Daphne logs
>>> and found a flurry of lines like this:
>>> 2016-12-01 14:35:14,513 WARNING WebSocket force closed for
>>>> websocket.send!QbxCqPhvyxVt due to receive backpressure
>>> I restarted all the server and worker processes to no effect. I was able
>>> to put the project back online by manually deleting all the “asgi:*” keys
>>> in Redis. But then, after a while, the backpressure built up and everything
>>> crashed again.
>>> The problem, I suppose, has something to do with the high frequency of
>>> messages that were to be passed via websocket in this particular project. A
>>> click triggers a message in each direction, and people were encouraged to
>>> click rapidly. So I probably have to throttle this, or else launch more
>>> workers and/or servers.
>>> But I'd like to know what, specifically, triggers these “backpressure”
>>> disconnections, and where I might look to monitor them /before/ errors
>>> start to occur. In one of the Redis queues, I suppose? If so, which one(s)
>>> – inbound or outbound? I suppose my idea, here, is that I might be able to
>>> automatically scale up if the queues start to fill up.
>>> Thank you in advance. Fun project!
>>> You received this message because you are subscribed to the Google
>>> Groups "Django users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-users...@googlegroups.com.
>>> To post to this group, send email to django...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/django-users.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> For more options, visit https://groups.google.com/d/optout.
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to firstname.lastname@example.org.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/
> For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to email@example.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit
For more options, visit https://groups.google.com/d/optout.