Gossiper Starvation
-------------------

                 Key: CASSANDRA-2253
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2253
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.0
         Environment: linux, windows
            Reporter: Mikael Sitruk


Gossiper periodic task will get into starvation in case large sstable files 
need to be deleted.
Indeed the SSTableDeletingReference uses the same scheduledTasks pool (from 
StorageService) as the Gossiper and other periodic tasks, but the gossiper 
tasks should run each second to assure correct cluster status (liveness of 
nodes). In case of large sstable files to be deleted (several GB) the delete 
operation can take more than 30 sec, thus making the whole cluster going into a 
wrong state where nodes are marked as not living while they are!
This will lead to unneeded additional load like hinted hand off, wrong cluster 
state, increase in latency.

One of the possible solution is to use a separate pool for periodic and non 
periodic tasks. 
I've implemented such change and it resolves the problem. 
I can provide a patch 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to