Re: Node forgets about most of its column families

2012-08-29 Thread aaron morton
But the following nodetool repair crashes. It has to be stopped and then re-started. How did it crash ? Are there any suggestions for logging or similar so that we can get a clue next time this happens. Can you make the logs from #5 available? If you feel you can describe the situation

Re: Node forgets about most of its column families

2012-08-29 Thread aaron morton
Thanks Peter. This is 1.1.X ? Any thoughts on how recent the last schema change was ? Had the schema started in a pre 1.1X cluster? If so had their been a migration change after 1.1 upgrade? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com

Re: Node forgets about most of its column families

2012-08-29 Thread Edward Sargisson
Hi Aaron, Thanks for the reply. I've recorded what we know at https://issues.apache.org/jira/browse/CASSANDRA-4583. This includes log snippets from two of the nodes from around the time. I don't know what is relevant so they've got everything that was in the system log at the time of the

Re: Node forgets about most of its column families

2012-08-29 Thread aaron morton
For those playing along at home Edwards ticket was marked as a dup of Problem with creating keyspace after drop https://issues.apache.org/jira/browse/CASSANDRA-4219 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/08/2012, at 4:43 AM,

Re: Node forgets about most of its column families

2012-08-28 Thread Edward Sargisson
For the record, we just had a recurrence of this. This time, when the node (#5) came back it didn't properly rejoin the ring. We stopped every node and brought them back one by one to get the ring to link up correctly. Then, all the even nodes (#2, #4, #6) had out of data schemas. nodetool

Re: Node forgets about most of its column families

2012-08-28 Thread Peter Schuller
I can confirm having seen this (no time to debug). One method of recovery is to jump the node back into the ring with auto_bootstrap set to false and an appropriate token set, after deleting system tables. That assumes you're willing to have the node take a few bad reads until you're able to

Re: Node forgets about most of its column families

2012-08-24 Thread aaron morton
If this is still a test environment can you try to reproduce the fault ? Or provide some more details on the sequence of events? If you still have the logs around can you see if any ERROR level messages were logged? Cheers - Aaron Morton Freelance Developer @aaronmorton

Re: Node forgets about most of its column families

2012-08-24 Thread Edward Sargisson
Sadly, I don't think we can get much. All I know about the repro is that it was around a node restart. I've just tried that and everything's fine. I see now ERROR level messages in the logs. Clearly, some other conditions are required but we don't know them as yet. Many thanks, Edward On

Re: Node forgets about most of its column families

2012-08-23 Thread Rob Coli
On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson edward.sargis...@globalrelay.net wrote: I was wondering if anybody had seen the following behaviour before and how we might detect it and keep the application running. I don't know the answer to your problem, but anyone who does will want to

Re: Node forgets about most of its column families

2012-08-23 Thread Edward Sargisson
Ah, yes, I forgot that bit thanks! 1.1.2 running on Centos. Running nodetool resetlocalschema then nodetool repair fixed the problem but not understanding what happened is a concern. Cheers, Edward On 12-08-23 12:40 PM, Rob Coli wrote: On Thu, Aug 23, 2012 at 11:47 AM, Edward Sargisson