[
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242859#comment-14242859
]
Benedict commented on CASSANDRA-8457:
-------------------------------------
FTR, I strongly doubt _"context switching"_ is actually as much of a problem as
we think, although constraining it is never a bad thing. The big hit we have is
_thread signalling_ costs, which is a different but related beast. Certainly
the talking point that raised this was discussing system time spent serving
"context switches" which would definitely be referring to signalling, not the
switching itself.
Now, we do use a BlockingQueue for OutboundTcpConnection which will incur these
costs, however I strongly suspect the impact will be much lower than predicted
- especially as the testing done to flag this up was on small clusters with
RF=1, where these threads would not be being exercised at all. The costs of
going to the network itself are likely to exceed the context switching costs,
and naturally permit messages to accumulate in the queue, reducing the number
of signals actually needed.
There's then the negative performance implications we have found from small
numbers of connections under NIO to consider, so that this change could have
significant downsides for the majority of deployed clusters (although if we get
batching in the client driver we may see these penalties disappear).
To establish if there's likely a benefit to exploit, we could most likely
refactor this code comparatively minimally (than rewriting to NIO/Netty) to
make use of the SharedExecutorPool to establish if such a positive effect is
indeed to be had, as this would reduce the number of threads in flight to those
actually serving work on the OTCs. This wouldn't affect the ITC, but I am
dubious of their contribution. We should probably also actually test if this is
indeed a problem from clusters at scale performing in-memory CL>1 reads.
> nio MessagingService
> --------------------
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Ariel Weisberg
> Labels: performance
> Fix For: 3.0
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big
> contributor to context switching, especially for larger clusters. Let's look
> at switching to nio, possibly via Netty.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)