Right, I have managed to fix the cluster, running repair across all the nodes now.
Pavel, I wasn't able to resync schema as I had all the nodes down with duplicated cfid. I had to find a way how to edit sstables manually and fix those duplicates, the only way seemed to use sstable2json but that was throwing an exception (same that cassandra itself, coming down from google's library) because of those duplicate cfids. With a help of a friend we have managed to hack sstable2json and to our surprise we didn't find the offending cfid value in schema_columnfamilies, we were looking around other system tables without luck. At this point we've decided to remove one of the offending column from schema_keyspaces and that made the node operational, at this stage I was able to dump schema via show schema, wipe system tables, saved_caches and commitlogs (which should be empty as I drained all the nodes before restarting them) and restore schema on each node one by one. Shouldn't we make this sstable2json somehow able to ignore exceptions and run it in some "war mode" that would eventually let people perform their tasks? I know that this can be actually a little hard as those exceptions are coming from the underlaying libraries, but just and idea how to potentially solve this could be useful. Thanks for help anyway. BTW: Any ideas why I didn't find those offending values? I am still having backup of all the system tables so it would be nice to get actually down to this problem. BTW2: Can I get cfids by doing query via cqlsh ? Happy New Year and keep up the good work! On 27 December 2012 17:47, Pavel Yaskevich <pove...@gmail.com> wrote: > Hi Michal, > > Those ids are stored in system.schema_columnfamilies as part of each > column family metadata. Have you tried deleting SSTable files from > <cassandra-data-path/system/schema_columnfamilies on failing nodes and > restarting them? That should re-fetch schema, also you can try 'nodetool > resetschema' command which does just that. > > Best Regards > -- > Pavel Yaskevich > > > On Thursday, December 27, 2012 at 6:28 AM, Michał Czerwiński wrote: > > > I have this problem (like here > > http://www.datastax.com/support-forums/topic/cassandra-does-not-startup) > > with duplicated cfIDs in the infromation schema, my whole cluster is > > currently down. > > I am trying to figure out where those cfIDs stored. > > I've manage to hack sstable2json not to throw same exception about > > duplicates but cannot find specific CFID which is duplicated. > > I am looking at schema_columnfamilies and schema_keyspaces and its not > > there... > > > > Any ideas would be appreciated... > >