[jira] [Created] (CASSANDRA-6082) 1.1.12 --> 1.2.x upgrade may result inconsistent ring

Chris Burroughs (JIRA) Mon, 23 Sep 2013 06:44:01 -0700

Chris Burroughs created CASSANDRA-6082:
------------------------------------------


             Summary: 1.1.12 --> 1.2.x upgrade may result inconsistent ring
                 Key: CASSANDRA-6082
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6082
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: 1.1.12 --> 1.2.9
            Reporter: Chris Burroughs
            Priority: Minor


This happened to me once, and since I don't have any more 1.1.x clusters I 
won't be testing again.  I hope the attached files are enough for someone to 
connect the dots.

I did a rolling restart to upgrade from 1.1.12 --> 1.2.9.  About a week later I 
discovered that one node was in an inconsistent state in the ring.  It was 
either:
 * up
 * host-id=null
 * missing

Depending on which node I ran nodetool status from.  I *think* I just missed 
this during the upgrade but can not rule out the possibility that it "just 
happened for no reason" some time after the upgrade.  It was detected when 
running repair in such a ring caused all sorts of terrible data "duplication" 
and performance tanked.  Restarting the seeds + "bad" node caused the ring to 
be consistent again.

Two possibly suspicious things are a ArrayIndexOutOfBoundsException on startup:

{noformat}
ERROR [GossipStage:1] 2013-09-06 10:45:35,213 CassandraDaemon.java (line 194) 
Exception in thread Thread[GossipStage:1,5,main]
java.lang.ArrayIndexOutOfBoundsException: 2
        at 
org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
        at 
org.apache.cassandra.service.StorageService.handleStateRemoving(StorageService.java:1607)
        at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1230)
        at 
org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1958)
        at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:841)
        at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:919)
        at 
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

and problems to hint delivery to multiple node.

{noformat}
ERROR [MutationStage:11] 2013-09-06 13:59:19,604 CassandraDaemon.java (line 
194) Exception in thread Thread[MutationStage:11,5,main]
java.lang.AssertionError: Missing host ID for 10.20.2.45
        at 
org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:583)
        at 
org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:552)
        at 
org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1658)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}


Not however that while there were delivery problems to multiple nodes during 
the rolling upgrade, only one node was in a funky state a week later.

Attached are the results of running gossipinfo and status on every node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-6082) 1.1.12 --> 1.2.x upgrade may result inconsistent ring

Reply via email to