[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242859#comment-14242859
 ] 

Benedict commented on CASSANDRA-8457:
-------------------------------------

FTR, I strongly doubt _"context switching"_ is actually as much of a problem as 
we think, although constraining it is never a bad thing. The big hit we have is 
_thread signalling_ costs, which is a different but related beast. Certainly 
the talking point that raised this was discussing system time spent serving 
"context switches" which would definitely be referring to signalling, not the 
switching itself.

Now, we do use a BlockingQueue for OutboundTcpConnection which will incur these 
costs, however I strongly suspect the impact will be much lower than predicted 
- especially as the testing done to flag this up was on small clusters with 
RF=1, where these threads would not be being exercised at all. The costs of 
going to the network itself are likely to exceed the context switching costs, 
and naturally permit messages to accumulate in the queue, reducing the number 
of signals actually needed. 

There's then the negative performance implications we have found from small 
numbers of connections under NIO to consider, so that this change could have 
significant downsides for the majority of deployed clusters (although if we get 
batching in the client driver we may see these penalties disappear).

To establish if there's likely a benefit to exploit, we could most likely 
refactor this code comparatively minimally (than rewriting to NIO/Netty) to 
make use of the SharedExecutorPool to establish if such a positive effect is 
indeed to be had, as this would reduce the number of threads in flight to those 
actually serving work on the OTCs. This wouldn't affect the ITC, but I am 
dubious of their contribution. We should probably also actually test if this is 
indeed a problem from clusters at scale performing in-memory CL>1 reads.


> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 3.0
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to