[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967621#comment-15967621
 ] 

Jason Brown commented on CASSANDRA-8457:
----------------------------------------

bq. doesn't Netty already have a standard way to deal with that problem of 
messages piling up in its queues?

Netty has a high/low water mark mechanism that looks at the number of bytes in 
the channel and sends a "writablility changed" event through channel once one 
of those thresholds has been reached. I'm currently using that feature in 
{{MessageOutHandler#channelWritabilityChanged()}} to know when we've hit a 
decent amount of buffered data before we explicitly call flush. Beyond this, I 
do not think netty has any other explicit back pressure mechanism built in 
(like a handler or something similar).

We could expand our use of the high/low water mark to say "if there's greater 
than <wildly large> number of bytes in the channel, drop 'some' messages". If 
we want to drop older messages for which we feel the client has (or reasonably 
will) timed out, we'll have to do something like what I've proposed in my most 
recent comments (and which I'm working on right now). This message expiration 
behavior can then become not only about timeout (I still think there's 
reasonable use in that), but also protects size of data in the channel. 

One other thing we can do is can bound the number of tasks that can be queued 
in to the channel 
([{{SingleThreadEventLoop#DEFAULT_MAX_PENDING_TASKS}}|https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/SingleThreadEventLoop.java#L35]).
 I quickly traced the netty code, and I *think* a 
{{RejectedExecutionException}} is thrown when you try to add a message to 
channel which is filled to it's capacity. I'm not sure we want to use this as 
the only backpressure mechanism, but, as unbounded queues are awful (the 
default queue size is {{Integer.MAX_VALUE}}), it might not be a bad idea to 
bound this to at least something sane (16k-32k as an upper bound can't be 
unreasonable for a single channel). This, of course, would expect us to be more 
resilient to dropped messages on the enqueuing side, which is probably a good 
idea anyway.

> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to