[
https://issues.apache.org/jira/browse/CASSANDRA-13630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138305#comment-16138305
]
Jason Brown commented on CASSANDRA-13630:
-----------------------------------------
bq. I thought worst case memory amplification from this NIO approach was 2x
message size which is worse than our current 1x message size, but it's not,
it's cluster size * message size if a message is fanned out to all nodes in the
cluster.
We do not have 1x amplification in pre-4.0 code; it's always been messageSize
times the number of target peers. In `OutboundTcpConnector` we wrote into a
[backing buffer of
64k|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L457]
for each outbound peer and flushed when the buffer filled up (see
`BufferedDataOutputStreamPlus`). The cost of the amplification is hidden by
that reusable backing buffer, but it's still there.
With CASSANDRA-8457, everything gets it's own distinct buffer, allocated once
per-message, which is serialized to and then flushed. With this ticket we'll
move back to the previous model where there's a backing buffer that's used for
aggregating small messages or chunks of larger messages. That buffer, of
course, is not reused, but that's because of the asynchronous nature of NIO vs
blocking IO.
(FTR, I have thought about moving serialization outside of the "outbound
connections" (either `OutboundTcpConnection` or netty handlers) - where we
serialize before sending to the outbound channels and send a slice of a buffer
to those channels. That way you only serialize once (less repetitive CPU work),
as well as potentially consume less memory. But I think that's a different
ticket.)
bq. I really wonder if that be a shared pool of threads and we size it
generously
yeah, i thought about this. The problem is that because the deserialization is
blocking, you basically need one thread in the pool for each "blocker"; else
you starve some deserialization activities. Hence, i just used a background
thread. Not my favorite choice, but I'm not sure a "well-sized" pool will be
sufficient.
Reading over your comments on the code itself this morning.
> support large internode messages with netty
> -------------------------------------------
>
> Key: CASSANDRA-13630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13630
> Project: Cassandra
> Issue Type: Task
> Components: Streaming and Messaging
> Reporter: Jason Brown
> Assignee: Jason Brown
> Fix For: 4.0
>
>
> As part of CASSANDRA-8457, we decided to punt on large mesages to reduce the
> scope of that ticket. However, we still need that functionality to ship a
> correctly operating internode messaging subsystem.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]