[
https://issues.apache.org/jira/browse/CASSANDRA-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13578128#comment-13578128
]
Brandon Williams commented on CASSANDRA-5244:
---------------------------------------------
This is more severe than we originally though, and causes CASSANDRA-5129 when
there is a secondary index:
{noformat}
"CompactionExecutor:1" daemon prio=10 tid=0x00007effbc03c800 nid=0x7abf waiting
for monitor entry [0x00007effc843a000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.cassandra.service.StorageService.reportSeverity(StorageService.java:905)
- waiting to lock <0x00000000ca576ac8> (a
org.apache.cassandra.service.StorageService)
at
org.apache.cassandra.db.compaction.CompactionInfo$Holder.started(CompactionInfo.java:141)
at
org.apache.cassandra.metrics.CompactionMetrics.beginCompaction(CompactionMetrics.java:90)
at
org.apache.cassandra.db.compaction.CompactionManager$9.run(CompactionManager.java:813)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
> Compactions don't work while node is bootstrapping
> --------------------------------------------------
>
> Key: CASSANDRA-5244
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5244
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.2.0 beta 1
> Reporter: Jouni Hartikainen
> Assignee: Brandon Williams
> Priority: Critical
> Labels: gossip
> Fix For: 1.2.2
>
>
> It seems that there is a race condition in StorageService that prevents
> compactions from completing while node is in a bootstrap state.
> I have been able to reproduce this multiple times by throttling streaming
> throughput to extend the bootstrap time while simultaneously inserting data
> to the cluster.
> The problems lies in the synchronization of initServer(int delay) and
> reportSeverity(double incr) methods as they both try to acquire the instance
> lock of StorageService through the use of synchronized keyword. As initServer
> does not return until the bootstrap has completed, all calls to
> reportSeverity will block until that. However, reportSeverity is called when
> starting compactions in CompactionInfo and thus all compactions block until
> bootstrap completes.
> This might severely degrade node's performance after bootstrap as it might
> have lots of compactions pending while simultaneously starting to serve reads.
> I have been able to solve the issue by adding a separate lock for
> reportSeverity and removing its class level synchronization. This of course
> is not a valid approach if we must assume that any of Gossiper's
> IEndpointStateChangeSubscribers could potentially end up calling back to
> StorageService's synchronized methods. However, at least at the moment, that
> does not seem to be the case.
> Maybe somebody with more experience about the codebase comes up with a better
> solution?
> (This might affect DynamicEndpointSnitch as well, as it also calls to
> reportSeverity in its setSeverity method)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira