I have an update on this.  I witnessed this same split ring problem, this time 
while doing a rolling upgrade from 1.1.4 to 1.1.6.  I found an easier 
workaround than modifying configs and restarting.  I found that by explicitly 
specifying the same token on the commandline using "-Dcassandra.replace_token=" 
when bringing up the new node, this problem wasn't exhibited.  Everything 
worked smoothly.

Ron

On Oct 10, 2012, at 12:38 PM, Ron Siemens wrote:

> 
> I witnessed the same behavior as reported by Edward and James.
> 
> Removing the host from its own seed list does not solve the problem.  
> Removing it from config of all nodes and restarting each, then restarting the 
> failed node worked.
> 
> Ron
> 
> On Sep 12, 2012, at 4:42 PM, Edward Sargisson wrote:
> 
>> I'm reposting my colleague's reply to Rob to the list (with James' 
>> permission) in case others are interested.
>> 
>> I'll add to James' post below to say I don't believe we saw the message that 
>> that slice of code would have printed.
>> 
>> "
>> Hey Rob,
>> 
>> Ed's AWOL right now and I'm not on u@c.a.o, but I can tell you that when 
>> I removed the downed seed node from its own list of seed nodes in 
>> cassandra.yaml that it didn't join the existing ring nor did it get any 
>> schemas or data from the existing ring; it felt like timeouts were 
>> happening. (IANA Cassandra wizard, so excuse my terminology impedance.)
>> 
>> Changing the machine's hostname and giving it a new IP, it behaved as 
>> expected; joining the ring, syncing both schema and associated data.
>> 
>> Downed node is 1.1.4, the rest of the ring is 1.1.2.
>> 
>> I'm in a situation where I can revert the IP/hostname change and retry 
>> the scenario as needed if you've got any ideas.
>> 
>> HTH,
>> 
>>    JAmes"
>> 
>> Cheers,
>> Edward
>> 
>> On 12-09-12 03:53 PM, Rob Coli wrote:
>>> On Tue, Sep 11, 2012 at 4:21 PM, Edward Sargisson
>>> <edward.sargis...@globalrelay.net> wrote:
>>>> If the downed node is a seed node then neither of the replace a dead node
>>>> procedures work (-Dcassandra.replace_token and taking initial_token-1). The
>>>> ring remains split.
>>>> [...]
>>>> In other words, if the host name is on the seeds list then it appears that
>>>> the rest of the ring refuses to bootstrap it.
>>> Close, but not exactly...
>>> 
>>> "./src/java/org/apache/cassandra/service/StorageService.java" line 559 of 
>>> 3090
>>> "
>>> if (DatabaseDescriptor.isAutoBootstrap()
>>>                 &&
>>> DatabaseDescriptor.getSeeds().contains(FBUtilities.getBroadcastAddress())
>>>                 && !SystemTable.isBootstrapped())
>>>             logger_.info("This node will not auto bootstrap because it
>>> is configured to be a seed node.");
>>> "
>>> 
>>> getSeeds asks your seed provider for a list of seeds. If you are using
>>> the SimpleSeedProvider, this basically turns the list from "seeds" in
>>> cassandra.yaml on the local node into a list of hosts.
>>> 
>>> So it isn't that the other nodes have this node in their seed list..
>>> it's that the node you are replacing has itself in its own seed list,
>>> and shouldn't. I understand that it can be tricky in conf management
>>> tools to make seed nodes' seed lists not contain themselves, but I
>>> believe it is currently necessary in this case.
>>> 
>>> FWIW, it's unclear to me (and Aaron Morton, whose curiousity was
>>> apparently equally piqued and is looking into it further..) why
>>> exactly seed nodes shouldn't bootstrap. It's possible that they only
>>> shouldn't bootstrap without being in "hibernate" mode, and that the
>>> code just hasn't been re-written post replace_token/hibernate to say
>>> that it's ok for seed nodes to bootstrap as long as they hibernate...
>>> 
>>> =Rob
>>> 
>> 
>> -- 
>> Edward Sargisson
>> senior java developer
>> Global Relay
>> 
>> edward.sargis...@globalrelay.net
>> 
>> 
>> 866.484.6630 
>> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore 
>>  (+65.3158.1301)
>> 
>> Global Relay Archive supports email, instant messaging, BlackBerry, 
>> Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook 
>> and more. 
>> 
>> Ask about Global Relay Message — The Future of Collaboration in the 
>> Financial Services World
>> 
>> All email sent to or from this address will be retained by Global Relay’s 
>> email archiving system. This message is intended only for the use of the 
>> individual or entity to which it is addressed, and may contain information 
>> that is privileged, confidential, and exempt from disclosure under 
>> applicable law.  Global Relay will not be liable for any compliance or 
>> technical information provided herein.  All trademarks are the property of 
>> their respective owners.
> 

Reply via email to