[ 
https://issues.apache.org/jira/browse/CASSANDRA-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876489#action_12876489
 ] 

Matthew F. Dennis commented on CASSANDRA-1160:
----------------------------------------------

I should have been more clear about that.  The failures in 
StorageServiceClientTest are because files from 
DatabaseDescriptor.getAllDataFileLocations() exist (the test asserts they do 
not).  This is because when I moved start into <init> I passed the same 
generation number that that initServer() uses, but initClient (used by the 
test) passes a generation number based on currentTimeMillis() which doesn't 
cause the file paths to exist.  It wasn't clear to me there was an easy way to 
determine what generation to use.

More importantly, this lead me to notice that both StorageService and 
StorageLoadBalancer are registered with Gossiper before start is called.  If 
start was moved into <init> I was afraid of introducing new race conditions 
(e.g. Gossiper starting up, doing things that would have resulted in 
publications and then later receiving the registration for such publications).

I agree that Gossiper is likely not being spun up correctly.  I think this 
applies to other things as well (in general things seem to have many 
interdependencies on boot.  I don't have anything specific to point at, but it 
feels like there are several race conditions that just happen to usually not be 
exhibited).

One solution I've used in the past that I like for this in general is that none 
of the singletons are inited in their own classes, but only by a 
BootInitializer of sorts that completely controls when the objects are 
inited/started/created/etc, in particular controlling the order things are 
done.  A single entry point into the system if you will.  This seems like some 
work from the point we're at now and probably a bit unnecessary.

Along those lines I had a similar patch that I didn't submit that took a third 
argument, List<IEndpointStateChangeSubsciber> initialSubscribers.  Each 
initialSubscriber was registered before the existing code in start was called.  
initServer built a list of the StorageService and StorageLoadBalance instances 
and passed that to start to ensure none of the publications were missed by 
either service.  Like the 0001-cassandra-0.6-1160.patch references to .instance 
were replaced with getInstance() calls which busy waited (via yield) until 
start completed but this meant that StorageService was managing 
StorageLoadBalancer registrations and that didn't feel right, hence 
preRegistrations() (so each could manager their own registrations).

So, basically two issues/questions if Gossiper.start moves to Gossiper.<init>:

1) how does Gossiper.<init> get the correct generation number (server v client)?

2) what about missed publications because Gossiper was started before 
StorageService and/or StorageLoadBalancer got a chance to register?


> race with insufficiently constructed Gossiper
> ---------------------------------------------
>
>                 Key: CASSANDRA-1160
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1160
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>            Priority: Minor
>             Fix For: 0.6.3
>
>         Attachments: 0001-cassandra-0.6-1160.patch
>
>
> Gossiper.start needs to be integrated into the constructor.  Currently you 
> can have threads using the gossiper instance before start finishes (or even 
> starts?), resulting in tracebacks like this:
> ERROR [GMFD:1] 2010-06-02 10:45:49,878 CassandraDaemon.java (line 78) Fatal 
> exception in thread Thread[GMFD:1,5,main]
> java.lang.AssertionError
>       at org.apache.cassandra.net.Header.<init>(Header.java:56)
>       at org.apache.cassandra.net.Header.<init>(Header.java:74)
>       at org.apache.cassandra.net.Message.<init>(Message.java:58)
>       at 
> org.apache.cassandra.gms.Gossiper.makeGossipDigestAckMessage(Gossiper.java:294)
>       at 
> org.apache.cassandra.gms.Gossiper$GossipDigestSynVerbHandler.doVerb(Gossiper.java:935)
>       at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)
> ERROR [GMFD:2] 2010-06-02 10:45:49,880 CassandraDaemon.java (line 78) Fatal 
> exception in thread Thread[GMFD:2,5,main]
> java.lang.AssertionError
>       at org.apache.cassandra.net.Header.<init>(Header.java:56)
>       at org.apache.cassandra.net.Header.<init>(Header.java:74)
>       at org.apache.cassandra.net.Message.<init>(Message.java:58)
>       at 
> org.apache.cassandra.gms.Gossiper.makeGossipDigestAckMessage(Gossiper.java:294)
>       at 
> org.apache.cassandra.gms.Gossiper$GossipDigestSynVerbHandler.doVerb(Gossiper.java:935)
>       at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to