[
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249930#comment-14249930
]
Ariel Weisberg commented on CASSANDRA-8457:
-------------------------------------------
#1 The wakeup is protected by a CAS so in the common case there shouldn't be
multiple threads contending to dispatch. The synchronized block is there for
the case where the thread that is finishing up dispatching signals that it is
going to sleep and a dispatch task will be necessary for the next submission.
At that point it has to check the queue one more time to avoid lost wakeups,
and it is possible a new dispatch task will be created while that is happening.
The synchronized forces the new task to wait while the last check and drain
completes. How often this race occurs and blocks a thread I have no idea. I
could add a counter and check.
The only way to avoid it is to lock while checking the queue empty condition
and updating the needs wakeup field, or to have a 1:1 mapping between sockets
and dispatch threads (AKA not SEP). This would force producers to lock on task
submission as well. I don't see how the dispatch task can atomically check that
there is no work to do and set the needs wakeup flag at the same time. At that
point is there a reason to use a lock free queue?
#2 I didn't replace the queue because I needed to maintain size for the dropped
message functionality and I didn't want to reason about maintaining size
non-atomically with queue operations like offer/poll/drainto. I could give it a
whirl. I am also not sure how well iterator.remove in CLQ works, but I can
check.
#3 Indeed this is a a typo
Jake it definitely doesn't address several sources of signaling, but should
reduce total # of threads signaled per request.
I will profile the two versions today and then add more nodes. For benchmark
purposes I could disable the message dropping functionality and use
MPSCLinkedQueue from Netty.
> nio MessagingService
> --------------------
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Ariel Weisberg
> Labels: performance
> Fix For: 3.0
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big
> contributor to context switching, especially for larger clusters. Let's look
> at switching to nio, possibly via Netty.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)