[ https://issues.apache.org/jira/browse/CASSANDRA-12131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505388#comment-15505388 ]
Aleksey Yeschenko commented on CASSANDRA-12131: ----------------------------------------------- It's unfortunate, but again, I don't see what can be done here on Cassandra side :\ > system_schema corruption causing nodes to not restart > ----------------------------------------------------- > > Key: CASSANDRA-12131 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12131 > Project: Cassandra > Issue Type: Bug > Reporter: Tom van der Woerdt > Priority: Critical > > h2. Symptoms > * Existing nodes fail to restart > * system_schema has broken data > * `nodetool describecluster` shows a full disagreement > This happened on two clusters I manage, and so far I have no idea why. I'll > describe symptoms and info on what I did to (partially) resolve this. Hope > the actual bug can get fixed. > All clusters run with the binary distribution from cassandra.apache.org. One > cluster runs on CentOS 6, the other CentOS 7, but both with Java 8u77. The > issue was seen on version 3.0.4 and during an upgrade from 3.0.6 to 3.0.7. > h2. Cluster 1 > Version: 3.0.4 > Hardware: 2 datacenters, 3 machines each > Network: 1Gbit, <1ms within the dc, <20ms cross-dc > This happened several months ago. I found out the hard way that every node > had a different schema_version when I tried to restart a node and it didn't > come back. Assuming it was just a single unhappy node, I ignored it and > restarted a second node (in a different datacenter) which also did not come > back. > I like my quorums so I didn't restart the other nodes. `nodetool > describecluster` showed that every node had a different schema version. > Querying system_schema showed a lot of records with their keys set to > `\0\0\0\0(...)\0\0`. Cassandra logs indicated corrupted data, which was then > fixed by running scrub. > Of course that didn't actually fix the data and using CQL I removed most of > the rows in system_schema that looked wrong. After doing that `nodetool > describecluster' agreed on a schema version again. I've attached the python > script I used to remove the records from the 'columns' table (fix.py), > similar scripts were used for other tables. > That didn't actually remove all the records, some proved impossible to delete > : > {code} > # Partial output from the query "select * from system_schema.columns" > | regular | -1 | text > system_distributed | > \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > | > \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > | none | exception_stacktrace > | regular | -1 | text > system_distributed | > \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > | > \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 | none | > finished_at > | regular | -1 | timestamp > system_distributed | > \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 > | > \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 | none | > keyspace_name > {code} > ... so I just left those there as it doesn't seem to impact the cluster other > than spewing this error every minute : > {code} > ERROR [CompactionExecutor:20] 2016-07-04 14:19:59,798 > CassandraDaemon.java:201 - Exception in thread > Thread[CompactionExecutor:20,1,main] > java.lang.AssertionError: Invalid clustering for the table: > org.apache.cassandra.db.Clustering$2@661b79a > at > org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:136) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:159) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:144) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:112) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52) > ~[apache-cassandra-3.0.7.jar:3.0.7] > {code} > The cluster works fine now, minus the phantom rows and minutely error on > every node. As for the two boxes that got killed, they were `removenode`d and > added back, somewhere in this process. > h2. Cluster 2 > Version: 3.0.6 > Hardware: 3 datacenters, 13 machines total > Network: 1Gbit, <1ms within the dc, <50ms cross-dc > This is a cluster I use for tests, which involves doing a lot of keyspace > changes. While doing a 3.0.6->3.0.7 upgrade this morning I noticed that the > first box I wanted to upgrade immediately didn't come back. > Startup on this node fails with : > {code} > ERROR [main] 2016-07-04 09:58:44,306 CassandraDaemon.java:698 - Exception > encountered during startup > java.lang.AssertionError: null > at > org.apache.cassandra.config.ColumnDefinition.<init>(ColumnDefinition.java:155) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.schema.SchemaKeyspace.createColumnFromRow(SchemaKeyspace.java:1015) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.schema.SchemaKeyspace.lambda$fetchColumns$12(SchemaKeyspace.java:995) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_77] > at > org.apache.cassandra.schema.SchemaKeyspace.fetchColumns(SchemaKeyspace.java:995) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:949) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:928) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:891) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:868) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:856) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:136) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:126) > ~[apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:235) > [apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557) > [apache-cassandra-3.0.7.jar:3.0.7] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685) > [apache-cassandra-3.0.7.jar:3.0.7] > {code} > `nodetool status -r` : > {code} > Datacenter: One > ================ > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) > Host ID Rack > DN cassandra1.dc1.mydomain.com 10.38 MB 256 6.8% > 7470d016-9a45-4e00-819a-77d7e09a14a2 1r1 > UN cassandra2.dc1.mydomain.com 8.64 MB 256 7.3% > cb93240d-b1c6-47f0-a1bb-59e4ae127a1f 1r2 > UN cassandra3.dc1.mydomain.com 11.32 MB 256 7.6% > ff6b3342-8142-42ba-8dd0-da00cd4ae95f 1r3 > UN cassandra4.dc1.mydomain.com 12.46 MB 256 7.2% > 91fad227-b394-4e25-be65-0f34a9dbbf9b 1r4 > UN cassandra5.dc1.mydomain.com 12.03 MB 256 8.4% > 74d98f17-df0b-40f2-b23b-7c6e5f49c2d7 1r5 > Datacenter: Two > ================ > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) > Host ID Rack > UN cassandra1.dc2.mydomain.com 10.39 MB 256 7.8% > f49efc68-d530-4074-912a-b008f578c9d0 2r1 > UN cassandra2.dc2.mydomain.com 8.23 MB 256 8.5% > b339a66e-4ef7-43c2-9507-9ac23dd7ad5c 2r2 > UN cassandra3.dc2.mydomain.com 10.34 MB 256 7.2% > 28d51ab8-5ee2-41a7-9e93-247fdf9f6d85 2r3 > Datacenter: Three > ================ > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) > Host ID Rack > UN cassandra1.dc3.mydomain.com 9.47 MB 256 7.2% > bbd06f32-6d40-49f4-b71c-30227aac20f1 3r1 > UN cassandra2.dc3.mydomain.com 9.88 MB 256 7.7% > 2789cffd-db20-47b9-962e-193326660345 3r2 > UN cassandra3.dc3.mydomain.com 11.36 MB 256 8.5% > 9a11ad49-112b-4b43-b937-f5e12176d725 3r3 > UN cassandra4.dc3.mydomain.com 11.77 MB 256 7.6% > 1009f985-2229-45c6-88c5-64ee508c4c3c 3r4 > UN cassandra5.dc3.mydomain.com 11.11 MB 256 7.9% > 4cbac3e8-c412-4375-ba2b-354a0bd81df8 3r5 > {code} > `nodetool describecluster` : > {code} > Cluster Information: > Name: my_cluster > Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > c478ac2f-c773-370c-aeca-d1a7169ad092: [10.xxx.xxx.xxx] > 35f98fc6-3ddc-3358-9c92-d5a251ebc844: [10.xxx.xxx.xxx] > a1573012-90a1-303f-81af-2ddc387cfc98: [10.xxx.xxx.xxx] > c4a86820-60ea-371e-a24c-31b2040d18f1: [10.xxx.xxx.xxx] > 1a734c68-c72f-3f0e-ac51-6fadc7854447: [10.xxx.xxx.xxx] > 5042d7d8-c1d2-334c-95ce-443260401940: [10.xxx.xxx.xxx] > dfc67ce1-5422-30e8-a533-9c2f0c2f7ad9: [10.xxx.xxx.xxx] > 0f32b476-0e6f-3064-8795-5d8adc2b3704: [10.xxx.xxx.xxx] > 31b66ee1-9447-39ff-9953-bad4b01ba87b: [10.xxx.xxx.xxx] > 7bb3cee9-eef5-356a-b435-9500550fda00: [10.xxx.xxx.xxx] > 6adcfe50-2a16-3bc5-93d0-006481c6217e: [10.xxx.xxx.xxx] > 5bb7c619-3e64-3ae0-b50e-8a6b5af78b1a: [10.xxx.xxx.xxx] > UNREACHABLE: [10.xxx.xxx.xxx] > {code} > Like the other cluster, this cluster has a corrupted system_schema. Partial > output from "select * from system_schema.keyspaces" : > {code} > keyspace_name | durable_writes | replication > ----------------------------------------------+----------------+---------------------------------------------------------------------------------------------------------- > system_auth | True | {'One': '5', > 'Two': '3', 'Three': '5', 'class': > 'org.apache.cassandra.locator.NetworkTopologyStrategy'} > system_schema | True | > {'class': > 'org.apache.cassandra.locator.LocalStrategy'} > \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 | False | > > {} > {code} > The cluster is still up and is able to take reads and writes. In cqlsh's > `desc keyspaces` I see an additional keyspace that pretends to be an empty > string : > {code} > cassandra@cqlsh> desc keyspaces; > system_distributed > system_schema system system_traces > "" system_auth > {code} > Very curious. > This issue can potentially destroy a cluster, so I'm marking this as > critical. The fix for broken nodes seems to be to run my fix.py against every > node and against every table in system_schema, after running a scrub on those > same nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)