[
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917174#comment-13917174
]
Christian Rolf commented on CASSANDRA-6788:
-------------------------------------------
Sorry, I should've been a more specific; this happens when the number of RPC
threads is limited. We've been running a ring of 12 nodes with 2048 as max RPC
threads for over a year without problems, but the past week we've been getting
zombie nodes almost every day.
Basically, the active thread counter is decremented at line 216 (pre-patch) of
CustomTThreadPoolServer.java, this can end the waiting loop at line 98. If a
new connection is made before the run-method of old thread has completed, the
execute() command at line 108 can cause a RejectedExecutionException.
> Race condition silently kills thrift server
> -------------------------------------------
>
> Key: CASSANDRA-6788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
> Project: Cassandra
> Issue Type: Bug
> Reporter: Christian Rolf
> Assignee: Christian Rolf
> Attachments: race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift
> server to silently stop listening for connections.
> It happens when the executor service throws a RejectedExecutionException,
> which is not caught.
>
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is
> still running fine.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)