Gossiper thread deadlock
------------------------

                 Key: CASSANDRA-778
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-778
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.6
            Reporter: Gary Dusbabek
            Assignee: Gary Dusbabek
             Fix For: 0.6
         Attachments: 0001-fix-deadlock.patch

Found this while attempting to bootstrap a node with more than a trivial amount 
of data:

Found one Java-level deadlock:
=============================
"GMFD:1":
  waiting to lock monitor 0x0000000100861d60 (object 0x00000001066a7ed8, a 
org.apache.cassandra.service.StorageService),
  which is held by "main"
"main":
  waiting to lock monitor 0x0000000100860710 (object 0x0000000106c7c968, a 
org.apache.cassandra.gms.Gossiper),
  which is held by "GMFD:1"

Java stack information for the threads listed above:
===================================================
"GMFD:1":
        at 
org.apache.cassandra.service.StorageService.getReplicationStrategy(StorageService.java:226)
        - waiting to lock <0x00000001066a7ed8> (a 
org.apache.cassandra.service.StorageService)
        at 
org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:634)
        at 
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:502)
        at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:445)
        at 
org.apache.cassandra.service.StorageService.onJoin(StorageService.java:812)
        at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:607)
        at org.apache.cassandra.gms.Gossiper.handleNewJoin(Gossiper.java:582)
        at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:649)
        - locked <0x0000000106c7c968> (a org.apache.cassandra.gms.Gossiper)
        at 
org.apache.cassandra.gms.Gossiper$GossipDigestAck2VerbHandler.doVerb(Gossiper.java:1061)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:637)
"main":
        at 
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:861)
        - waiting to lock <0x0000000106c7c968> (a 
org.apache.cassandra.gms.Gossiper)
        at 
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:347)
        at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:318)
        - locked <0x00000001066a7ed8> (a 
org.apache.cassandra.service.StorageService)
        at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:99)
        at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:174)

Found 1 deadlock.

main acquires SS lock and doesn't release it before attempting to acquire the 
Gossiper lock.  Meanwhile, the gossip stage acquires the Gossiper lock and then 
attempts to acquire the SS lock.

Solution is to have finer-grained locking on the resource in SS (map of 
replication strategies), or to move the collection to a different class (DD 
maybe?).  This was introduced in CASSANDRA-620.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to