[jira] [Updated] (CASSANDRA-6082) 1.1.12 --> 1.2.x upgrade may result inconsistent ring

Chris Burroughs (JIRA) Mon, 23 Sep 2013 06:53:01 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Burroughs updated CASSANDRA-6082:
---------------------------------------

    Attachment: c-status
                c-gossipinfo
    
> 1.1.12 --> 1.2.x upgrade may result inconsistent ring
> -----------------------------------------------------
>
>                 Key: CASSANDRA-6082
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6082
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 1.1.12 --> 1.2.9
>            Reporter: Chris Burroughs
>            Priority: Minor
>         Attachments: c-gossipinfo, c-status
>
>
> This happened to me once, and since I don't have any more 1.1.x clusters I 
> won't be testing again.  I hope the attached files are enough for someone to 
> connect the dots.
> I did a rolling restart to upgrade from 1.1.12 --> 1.2.9.  About a week later 
> I discovered that one node was in an inconsistent state in the ring.  It was 
> either:
>  * up
>  * host-id=null
>  * missing
> Depending on which node I ran nodetool status from.  I *think* I just missed 
> this during the upgrade but can not rule out the possibility that it "just 
> happened for no reason" some time after the upgrade.  It was detected when 
> running repair in such a ring caused all sorts of terrible data "duplication" 
> and performance tanked.  Restarting the seeds + "bad" node caused the ring to 
> be consistent again.
> Two possibly suspicious things are a ArrayIndexOutOfBoundsException on 
> startup:
> {noformat}
> ERROR [GossipStage:1] 2013-09-06 10:45:35,213 CassandraDaemon.java (line 194) 
> Exception in thread Thread[GossipStage:1,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 2
>         at 
> org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
>         at 
> org.apache.cassandra.service.StorageService.handleStateRemoving(StorageService.java:1607)
>         at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1230)
>         at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1958)
>         at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:841)
>         at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:919)
>         at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> and problems to hint delivery to multiple node.
> {noformat}
> ERROR [MutationStage:11] 2013-09-06 13:59:19,604 CassandraDaemon.java (line 
> 194) Exception in thread Thread[MutationStage:11,5,main]
> java.lang.AssertionError: Missing host ID for 10.20.2.45
>         at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:583)
>         at 
> org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:552)
>         at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1658)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Not however that while there were delivery problems to multiple nodes during 
> the rolling upgrade, only one node was in a funky state a week later.
> Attached are the results of running gossipinfo and status on every node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-6082) 1.1.12 --> 1.2.x upgrade may result inconsistent ring

Reply via email to