Re: Losing keyspace on cassandra upgrade

Michael Kjellman Wed, 19 Sep 2012 08:33:07 -0700

@Edward Do you have a bug number for that by chance?

On Sep 19, 2012, at 8:25 AM, "Edward Sargisson" 
<edward.sargis...@globalrelay.net<mailto:edward.sargis...@globalrelay.net>> 
wrote:


We've seen that before too - supposedly it was fixed in 1.1.5. Your experience 
casts some doubt on that.

Our workaround, thus far, is to shut down the entire ring and then bring each 
node back up starting with known good.
Then you do nodetool resetlocalschema on the node that's confused and make sure 
it gets the schema linked up properly.
Then nodetool repair.

I see you've done that but we found a complete ring restart was necessary. This 
was on Cass 1.1.1.

Cheers,
Edward

On 12-09-19 08:12 AM, Michael Kjellman wrote:

Sounds like you are loosing your system keyspace. When you say nothing 
important changed between yaml files do you mean with or without your changes?

Did your data directories change in the migration? Permissions okay?

I've done a 1.1.1 to 1.1.5 upgrade on many of my nodes without issue..

On Sep 19, 2012, at 7:44 AM, "Thomas Stets" 
<thomas.st...@gmail.com><mailto:thomas.st...@gmail.com> wrote:



I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 
1.1.5

I have the same cassandra keyspace on all our staging systems:

development:  a 3-node cluster
integration: a 3-node cluster
QS: a 2-node cluster
(productive will be a 4-node cluster, which is as yet not active)

All clusters were running cassandra 1.1.1. Before going productive I wanted to 
upgrade to the
latest productive version of cassandra.

In all cases my keyspace disappeared when I started the cluster with cassandra 
1.1.5.
On the development system I didn't realize at first what was happening. I just 
wondered that nodetool
showed a very low amount of data. On integration I saw the problem quickly, but 
could not recover the
data. I re-installed the cassandra cluster from scratch, and populated it with 
our test data, so our
developers could work.

I am currently using the QS system to recreate the problem and try to find what 
I am doing wrong,
and how I can avoid losing productive data once we are live.

Basically I was doing the following:

1. create a snapshot on every node
2. create a tar.gz of my data directory, just to be safe
3. shut down and re-start cassandra 1.1.1 (just to see that it is not the 
re-start that is creating the problem)
4. verify that the keyspace is still known, and the data present.
5. shut down cassandra 1.1.1
6. copy the config to cassandra 1.1.5 (doing a diff of cassandra.yaml to the 
new one first, to see whether anything important has changed)
7. start cassandra 1.1.5

In the log file, after the "Replaying ..." messages I find the following:

 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 
759 mutations from unknown (probably removed) CF with id 1187
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 
606 mutations from unknown (probably removed) CF with id 1186
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 
53 mutations from unknown (probably removed) CF with id 1185
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 
1945 mutations from unknown (probably removed) CF with id 1184
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 
1945 mutations from unknown (probably removed) CF with id 1191
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 
7506 mutations from unknown (probably removed) CF with id 1190
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
88 mutations from unknown (probably removed) CF with id 1189
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
87 mutations from unknown (probably removed) CF with id 1188
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
354 mutations from unknown (probably removed) CF with id 1195
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
87 mutations from unknown (probably removed) CF with id 1194
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
45 mutations from unknown (probably removed) CF with id 1192
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
82 mutations from unknown (probably removed) CF with id 1197
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
46386 mutations from unknown (probably removed) CF with id 1177
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 
69 mutations from unknown (probably removed) CF with id 1178
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 
73 mutations from unknown (probably removed) CF with id 1179
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 
88 mutations from unknown (probably removed) CF with id 1181
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 
46386 mutations from unknown (probably removed) CF with id 1182
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 
7506 mutations from unknown (probably removed) CF with id 1183
 INFO [main] 2012-09-19 15:15:50,325 CommitLog.java (line 131) Log replay 
complete, 0 replayed mutations

This is the first obvious indication something is wrong. Going further up in 
the log file I discover that the SSTableReader logs only system keyspace files.

Currently my cluster is in the folloing state:

node 1 runs cassandra 1.1.5, and doesn't know my keyspace
node 2 runs cassandra 1.1.1, and still nows my keyspace.

nodetool ring confirms this: node a has a load of 29kb, node 2 of roughly 1GB. 
The cluster itself is still intact, i.e. nodetool ring shows both nodes.

I tried a nodetool resetlocalschema, and nodetool repair, but that didn't 
change anything.

Any idea what I have been doing wrong (the preferred solution), or whether I 
stumbled over a cassandra bug (not so nice)?


  TIA, Thomas


'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




--

Edward Sargisson

senior java developer
Global Relay

edward.sargis...@globalrelay.net<mailto:edward.sargis...@globalrelay.net>


866.484.6630
New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, 
Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more.


Ask about Global Relay Message<http://www.globalrelay.com/services/message> — 
The Future of Collaboration in the Financial Services World

All email sent to or from this address will be retained by Global Relay’s email 
archiving system. This message is intended only for the use of the individual 
or entity to which it is addressed, and may contain information that is 
privileged, confidential, and exempt from disclosure under applicable law.  
Global Relay will not be liable for any compliance or technical information 
provided herein.  All trademarks are the property of their respective owners.

'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: Losing keyspace on cassandra upgrade

Reply via email to