[ 
https://issues.apache.org/jira/browse/CASSANDRA-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-5244:
--------------------------------------

    Priority: Minor  (was: Critical)

Thanks for the detective work, Jouni.  I'll let Brandon comment on solutions; 
in the meantime, marking Minor since while inconvenient this does not 
compromise correctness.
                
> Compactions don't work while node is bootstrapping
> --------------------------------------------------
>
>                 Key: CASSANDRA-5244
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5244
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Jouni Hartikainen
>            Priority: Minor
>
> It seems that there is a race condition in StorageService that prevents 
> compactions from completing while node is in a bootstrap state.
> I have been able to reproduce this multiple times by throttling streaming 
> throughput to extend the bootstrap time while simultaneously inserting data 
> to the cluster.
> The problems lies in the synchronization of initServer(int delay) and 
> reportSeverity(double incr) methods as they both try to acquire the instance 
> lock of StorageService through the use of synchronized keyword. As initServer 
> does not return until the bootstrap has completed, all calls to 
> reportSeverity will block until that. However, reportSeverity is called when 
> starting compactions in CompactionInfo and thus all compactions block until 
> bootstrap completes. 
> This might severely degrade node's performance after bootstrap as it might 
> have lots of compactions pending while simultaneously starting to serve reads.
> I have been able to solve the issue by adding a separate lock for 
> reportSeverity and removing its class level synchronization. This of course 
> is not a valid approach if we must assume that any of Gossiper's 
> IEndpointStateChangeSubscribers could potentially end up calling back to 
> StorageService's synchronized methods. However, at least at the moment, that 
> does not seem to be the case.
> Maybe somebody with more experience about the codebase comes up with a better 
> solution?
> (This might affect DynamicEndpointSnitch as well, as it also calls to 
> reportSeverity in its setSeverity method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to