[ 
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249930#comment-14249930
 ] 

Ariel Weisberg commented on CASSANDRA-8457:
-------------------------------------------

#1 The wakeup is protected by a CAS so in the common case there shouldn't be 
multiple threads contending to dispatch. The synchronized block is there for 
the case where the thread that is finishing up dispatching signals that it is 
going to sleep and a dispatch task will be necessary for the next submission. 
At that point it has to check the queue one more time to avoid lost wakeups, 
and it is possible a new dispatch task will be created while that is happening. 
The synchronized forces the new task to wait while the last check and drain 
completes. How often this race occurs and blocks a thread I have no idea. I 
could add a counter and check.

The only way to avoid it is to lock while checking the queue empty condition 
and updating the needs wakeup field, or to have a 1:1 mapping between sockets 
and dispatch threads (AKA not SEP). This would force producers to lock on task 
submission as well. I don't see how the dispatch task can atomically check that 
there is no work to do and set the needs wakeup flag at the same time. At that 
point is there a reason to use a lock free queue? 

#2 I didn't replace the queue because I needed to maintain size for the dropped 
message functionality and I didn't want to reason about maintaining size 
non-atomically with queue operations like offer/poll/drainto. I could give it a 
whirl. I am also not sure how well iterator.remove in CLQ works, but I can 
check.

#3 Indeed this is a a typo

Jake it definitely doesn't address several sources of signaling, but should 
reduce total # of threads signaled per request.

I will profile the two versions today and then add more nodes. For benchmark 
purposes I could disable the message dropping functionality and use 
MPSCLinkedQueue from Netty.

> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 3.0
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big 
> contributor to context switching, especially for larger clusters.  Let's look 
> at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to