[
https://issues.apache.org/jira/browse/CASSANDRA-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Burroughs updated CASSANDRA-6082:
---------------------------------------
Attachment: c-status
c-gossipinfo
> 1.1.12 --> 1.2.x upgrade may result inconsistent ring
> -----------------------------------------------------
>
> Key: CASSANDRA-6082
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6082
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: 1.1.12 --> 1.2.9
> Reporter: Chris Burroughs
> Priority: Minor
> Attachments: c-gossipinfo, c-status
>
>
> This happened to me once, and since I don't have any more 1.1.x clusters I
> won't be testing again. I hope the attached files are enough for someone to
> connect the dots.
> I did a rolling restart to upgrade from 1.1.12 --> 1.2.9. About a week later
> I discovered that one node was in an inconsistent state in the ring. It was
> either:
> * up
> * host-id=null
> * missing
> Depending on which node I ran nodetool status from. I *think* I just missed
> this during the upgrade but can not rule out the possibility that it "just
> happened for no reason" some time after the upgrade. It was detected when
> running repair in such a ring caused all sorts of terrible data "duplication"
> and performance tanked. Restarting the seeds + "bad" node caused the ring to
> be consistent again.
> Two possibly suspicious things are a ArrayIndexOutOfBoundsException on
> startup:
> {noformat}
> ERROR [GossipStage:1] 2013-09-06 10:45:35,213 CassandraDaemon.java (line 194)
> Exception in thread Thread[GossipStage:1,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 2
> at
> org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
> at
> org.apache.cassandra.service.StorageService.handleStateRemoving(StorageService.java:1607)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1230)
> at
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1958)
> at
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:841)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:919)
> at
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> and problems to hint delivery to multiple node.
> {noformat}
> ERROR [MutationStage:11] 2013-09-06 13:59:19,604 CassandraDaemon.java (line
> 194) Exception in thread Thread[MutationStage:11,5,main]
> java.lang.AssertionError: Missing host ID for 10.20.2.45
> at
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:583)
> at
> org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:552)
> at
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1658)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> Not however that while there were delivery problems to multiple nodes during
> the rolling upgrade, only one node was in a funky state a week later.
> Attached are the results of running gossipinfo and status on every node.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira