[ 
https://issues.apache.org/jira/browse/CASSANDRA-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578128#comment-13578128
 ] 

Brandon Williams commented on CASSANDRA-5244:
---------------------------------------------

This is more severe than we originally though, and causes CASSANDRA-5129 when 
there is a secondary index:

{noformat}
"CompactionExecutor:1" daemon prio=10 tid=0x00007effbc03c800 nid=0x7abf waiting 
for monitor entry [0x00007effc843a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at 
org.apache.cassandra.service.StorageService.reportSeverity(StorageService.java:905)
    - waiting to lock <0x00000000ca576ac8> (a 
org.apache.cassandra.service.StorageService)
    at 
org.apache.cassandra.db.compaction.CompactionInfo$Holder.started(CompactionInfo.java:141)
    at 
org.apache.cassandra.metrics.CompactionMetrics.beginCompaction(CompactionMetrics.java:90)
    at 
org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:813)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
{noformat}
                
> Compactions don't work while node is bootstrapping
> --------------------------------------------------
>
>                 Key: CASSANDRA-5244
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5244
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jouni Hartikainen
>            Assignee: Brandon Williams
>            Priority: Critical
>              Labels: gossip
>             Fix For: 1.2.2
>
>
> It seems that there is a race condition in StorageService that prevents 
> compactions from completing while node is in a bootstrap state.
> I have been able to reproduce this multiple times by throttling streaming 
> throughput to extend the bootstrap time while simultaneously inserting data 
> to the cluster.
> The problems lies in the synchronization of initServer(int delay) and 
> reportSeverity(double incr) methods as they both try to acquire the instance 
> lock of StorageService through the use of synchronized keyword. As initServer 
> does not return until the bootstrap has completed, all calls to 
> reportSeverity will block until that. However, reportSeverity is called when 
> starting compactions in CompactionInfo and thus all compactions block until 
> bootstrap completes. 
> This might severely degrade node's performance after bootstrap as it might 
> have lots of compactions pending while simultaneously starting to serve reads.
> I have been able to solve the issue by adding a separate lock for 
> reportSeverity and removing its class level synchronization. This of course 
> is not a valid approach if we must assume that any of Gossiper's 
> IEndpointStateChangeSubscribers could potentially end up calling back to 
> StorageService's synchronized methods. However, at least at the moment, that 
> does not seem to be the case.
> Maybe somebody with more experience about the codebase comes up with a better 
> solution?
> (This might affect DynamicEndpointSnitch as well, as it also calls to 
> reportSeverity in its setSeverity method)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to