Gossiper Starvation
-------------------
Key: CASSANDRA-2253
URL: https://issues.apache.org/jira/browse/CASSANDRA-2253
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 0.7.0
Environment: linux, windows
Reporter: Mikael Sitruk
Gossiper periodic task will get into starvation in case large sstable files
need to be deleted.
Indeed the SSTableDeletingReference uses the same scheduledTasks pool (from
StorageService) as the Gossiper and other periodic tasks, but the gossiper
tasks should run each second to assure correct cluster status (liveness of
nodes). In case of large sstable files to be deleted (several GB) the delete
operation can take more than 30 sec, thus making the whole cluster going into a
wrong state where nodes are marked as not living while they are!
This will lead to unneeded additional load like hinted hand off, wrong cluster
state, increase in latency.
One of the possible solution is to use a separate pool for periodic and non
periodic tasks.
I've implemented such change and it resolves the problem.
I can provide a patch
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira