Benedict commented on CASSANDRA-15013:

There is a potential semantic change here that could have negative consequences 
for clients, so we need to be careful.  

Presently, clients can expect a response to every message they send to the 
server while they are connected; the server handles time outs, failures etc. 
and makes sure to respond to the client in some way (unless the connection is 
lost).  If we change this, we would need to corroborate client behaviour - do 
they all gracefully timeout outstanding requests, for instance, or do they kill 
them on a per-connection basis?

In either case, though, it is a shame to discard work that we've gone to the 
effort of performing.  From a cluster stability perspective, this is more 
likely to put the cluster under steadily more pressure, rather than less, as 
the client will no doubt want to retry this work.  It is better to either 
discard work that has yet to be initiated, or to try not to discard work at all 
and provide back pressure signals to the client.

I think it would also be more relevant as an initial step to remove the 
blocking behaviour on incoming, so that the eventLoop can always service the 
outgoing queue to prevent this build up.  There's a strong chance the build up 
of outgoing messages you see is down to the eventLoop that must process it 
being blocked on offering work to the {{requestExecutor}}, and by removing this 
block the outgoing queue will not accumulate so readily.

There are two options if we do this: stop reading from incoming channels when 
the {{requestExecutor}} is full, or throw {{OverloadedException}}.  In my 
opinion, this is exactly what TCP back pressure is for, but we also have a 
world where clients have been depending on the server trying its best to never 
push back, so they have inadequate queueing models internally, with no support 
for noticing or handling this back pressure.  

This to me is a design flaw that should be addressed in clients, but we could 
mitigate it for now by increasing the size of our {{requestExecutor}} queue 
(which is actually unnecessarily small), or even making it unbounded and simply 
tracking the total number of bytes we have read off the wire but not answered.  
Perhaps we could even make the behaviour of {{OverloadedException}} vs back 
pressure a connection-configurable option, so that clients with poor flow 
control can utilise {{OverloadedException}} to handle this, and those with 
better control can use normal TCP flow control mechanisms.

What do you think?

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> -------------------------------------------------------------------------------
>                 Key: CASSANDRA-15013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Sumanth Pasupuleti
>            Assignee: Sumanth Pasupuleti
>            Priority: Major
>             Fix For: 4.0, 3.0.x, 3.11.x
>         Attachments: heap dump showing each ImmediateFlusher taking upto 
> 600MB.png
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to