[
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967568#comment-15967568
]
Sylvain Lebresne commented on CASSANDRA-8457:
---------------------------------------------
bq. I think it's important that a single slow node or network issue resulting
in a socket that isn't writable shouldn't allow an arbitrary amount of data to
collect on the heap. Right now there is nothing that can drop the data in that
scenario.
I don't necessarily disagree on that somewhat general statement, but I'm far
from convinced that checking for expired message is the right tool for the job
in the first place. The fact is that expiration is time-based, that default
timeouts are in multiple of seconds, so plenty of time for message to
accumulate and blow the heap without having any of them being droppable. On top
of that, not all message have timeouts, which actually make sense because
message timeout isn't a back-pressure mechanism, it's about how long we're
willing to wait for an answer to a request message, and hence one-way message
have no reason to have such timeout. And that's part of the point, I dislike
using a concept that isn't meant to be related to back-pressure to do
back-pressure, especially when it's as flawed as this one. Users shouldn't have
to worry that nodes could OOM because they put writes timeout high, it's just
not intuitive.
Don't get me wrong, I don't disagree that some back-pressure mechanism should
be added for that problem, but that should be more based on the amount of
message data (or, at the very least the number of such messages) in the Netty
queue. Surely we're not the only one facing this problem though, doesn't Netty
already have a standard way to deal with that problem of messages piling up in
its queues?
> nio MessagingService
> --------------------
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Assignee: Jason Brown
> Priority: Minor
> Labels: netty, performance
> Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big
> contributor to context switching, especially for larger clusters. Let's look
> at switching to nio, possibly via Netty.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)