[ 
https://issues.apache.org/jira/browse/CASSANDRA-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706147#comment-15706147
 ] 

Jason Brown commented on CASSANDRA-12966:
-----------------------------------------

I've chosen to solve this by updating {{SystemKeyspace}} in the following ways:
- remove the {{synchronized}} keyword from the functions that update the 
{{peers}} table
- on those same functions, execute the update to the {{peers}} asynchronously, 
on a mutation stage thread, instead of blocking the current thread (which is 
often the Gossip stage thread)
- return a {{Future}} from those functions so that tests can behave properly

||3.0||3.X||trunk||
|[branch|https://github.com/jasobrown/cassandra/tree/12966_3.0]|[branch|https://github.com/jasobrown/cassandra/tree/12966_3.X]|[branch|https://github.com/jasobrown/cassandra/tree/12966_trunk]|
|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-12966_3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-12966_3.X-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-12966_trunk-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-12966_3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-12966_3.X-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/jasobrown/job/jasobrown-12966_trunk-testall/]|

> Gossip thread slows down when using batch commit log
> ----------------------------------------------------
>
>                 Key: CASSANDRA-12966
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12966
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>
> When using batch commit log mode, the Gossip thread slows down when peers 
> after a node bounces. This is because we perform a bunch of updates to the 
> peers table via {{SystemKeyspace.updatePeerInfo}}, which is a synchronized 
> method. How quickly each one of those individual updates takes depends on how 
> busy the system is at the time wrt write traffic. If the system is largely 
> quiescent, each update will be relatively quick (just waiting for the fsync). 
> If the system is getting a lot of writes, and depending on the 
> commitlog_sync_batch_window_in_ms, each of the Gossip thread's updates can 
> get stuck in the backlog, which causes the Gossip thread to stop processing. 
> We have observed in large clusters that a rolling restart causes triggers and 
> exacerbates this behavior. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to