[
https://issues.apache.org/jira/browse/CASSANDRA-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Oleg Kibirev updated CASSANDRA-5456:
------------------------------------
Comment: was deleted
(was: Copying bootstrapTokens rather than holding a lock on the same for entire
loop)
> Large number of bootstrapping nodes cause gossip to stop working
> ----------------------------------------------------------------
>
> Key: CASSANDRA-5456
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5456
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.1.10
> Reporter: Oleg Kibirev
>
> Long running section of code in PendingRangeCalculatorService is synchronized
> on bootstrapTokens. This causes gossip to stop working as it waits for the
> same lock when a large number of nodes (hundreds in our case) are
> bootstrapping. Consequently, the whole cluster becomes non-functional.
> I experimented with the following change in
> PendingRangeCalculatorService.java and it resolved the problem in our case.
> Prior code had synchronized around the for loop.
> synchronized(bootstrapTokens) {
> bootstrapTokens = new LinkedHashMap<Token, InetAddress>(bootstrapTokens);
> }
> for (Map.Entry<Token, InetAddress> entry : bootstrapTokens.entrySet())
> {
> InetAddress endpoint = entry.getValue();
> allLeftMetadata.updateNormalToken(entry.getKey(), endpoint);
> for (Range<Token> range :
> strategy.getAddressRanges(allLeftMetadata).get(endpoint))
> pendingRanges.put(range, endpoint);
> allLeftMetadata.removeEndpoint(endpoint);
> }
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira