[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13917174#comment-13917174
 ] 

Christian Rolf commented on CASSANDRA-6788:
-------------------------------------------

Sorry, I should've been a more specific; this happens when the number of RPC 
threads is limited. We've been running a ring of 12 nodes with 2048 as max RPC 
threads for over a year without problems, but the past week we've been getting 
zombie nodes almost every day.

Basically, the active thread counter is decremented at line 216 (pre-patch) of 
CustomTThreadPoolServer.java, this can end the waiting loop at line 98. If a 
new connection is made before the run-method of old thread has completed, the 
execute() command at line 108 can cause a RejectedExecutionException.

> Race condition silently kills thrift server
> -------------------------------------------
>
>                 Key: CASSANDRA-6788
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Christian Rolf
>            Assignee: Christian Rolf
>         Attachments: race_patch.diff
>
>
> There's a race condition in CustomTThreadPoolServer that can cause the thrift 
> server to silently stop listening for connections. 
> It happens when the executor service throws a RejectedExecutionException, 
> which is not caught.
>  
> Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
> still running fine.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to