[
https://issues.apache.org/jira/browse/CASSANDRA-12131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Philip Thompson updated CASSANDRA-12131:
----------------------------------------
Fix Version/s: 3.0.x
> system_schema corruption causing nodes to not restart
> -----------------------------------------------------
>
> Key: CASSANDRA-12131
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12131
> Project: Cassandra
> Issue Type: Bug
> Reporter: Tom van der Woerdt
> Priority: Critical
> Fix For: 3.0.x
>
>
> Symptoms :
> * Existing nodes fail to restart
> * system_schema has broken data
> * `nodetool describecluster` shows a full disagreement
> This happened on two clusters I manage, and so far I have no idea why. I'll
> describe symptoms and info on what I did to (partially) resolve this. Hope
> the actual bug can get fixed.
> All clusters run with the binary distribution from cassandra.apache.org. One
> cluster runs on CentOS 6, the other CentOS 7, but both with Java 8u77. The
> issue was seen on version 3.0.4 and during an upgrade from 3.0.6 to 3.0.7.
> ** Cluster 1 **
> Version: 3.0.4
> Hardware: 2 datacenters, 3 machines each
> Network: 1Gbit, <1ms within the dc, <20ms cross-dc
> This happened several months ago. I found out the hard way that every node
> had a different schema_version when I tried to restart a node and it didn't
> come back. Assuming it was just a single unhappy node, I ignored it and
> restarted a second node (in a different datacenter) which also did not come
> back.
> I like my quorums so I didn't restart the other nodes. `nodetool
> describecluster` showed that every node had a different schema version.
> Querying system_schema showed a lot of records with their keys set to
> `\0\0\0\0(...)\0\0`. Cassandra logs indicated corrupted data, which was then
> fixed by running scrub.
> Of course that didn't actually fix the data and using CQL I removed most of
> the rows in system_schema that looked wrong. After doing that `nodetool
> describecluster' agreed on a schema version again. I've attached the python
> script I used to remove the records from the 'columns' table (fix.py),
> similar scripts were used for other tables.
> That didn't actually remove all the records, some proved impossible to delete
> :
> {code}
> # Partial output from the query "select * from system_schema.columns"
> | regular | -1 | text
> system_distributed |
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
> |
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
> | none | exception_stacktrace
> | regular | -1 | text
> system_distributed |
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
> |
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 | none |
> finished_at
> | regular | -1 | timestamp
> system_distributed |
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
> |
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 | none |
> keyspace_name
> {code}
> ... so I just left those there as it doesn't seem to impact the cluster other
> than spewing this error every minute :
> {code}
> ERROR [CompactionExecutor:20] 2016-07-04 14:19:59,798
> CassandraDaemon.java:201 - Exception in thread
> Thread[CompactionExecutor:20,1,main]
> java.lang.AssertionError: Invalid clustering for the table:
> org.apache.cassandra.db.Clustering$2@661b79a
> at
> org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:136)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:159)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:144)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:112)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> {code}
> The cluster works fine now, minus the phantom rows and minutely error on
> every node. As for the two boxes that got killed, they were `removenode`d and
> added back, somewhere in this process.
> ** Cluster 2 **
> Version: 3.0.6
> Hardware: 3 datacenters, 13 machines total
> Network: 1Gbit, <1ms within the dc, <50ms cross-dc
> This is a cluster I use for tests, which involves doing a lot of keyspace
> changes. While doing a 3.0.6->3.0.7 upgrade this morning I noticed that the
> first box I wanted to upgrade immediately didn't come back. Instead of trying
> to fix it (or just rebuilding the cluster) I left it like that and am now
> filing this report.
> Startup on this node fails with :
> {code}
> ERROR [main] 2016-07-04 09:58:44,306 CassandraDaemon.java:698 - Exception
> encountered during startup
> java.lang.AssertionError: null
> at
> org.apache.cassandra.config.ColumnDefinition.<init>(ColumnDefinition.java:155)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.schema.SchemaKeyspace.createColumnFromRow(SchemaKeyspace.java:1015)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.schema.SchemaKeyspace.lambda$fetchColumns$12(SchemaKeyspace.java:995)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_77]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchColumns(SchemaKeyspace.java:995)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:949)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:928)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:891)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:868)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:856)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:136)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:126)
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:235)
> [apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
> [apache-cassandra-3.0.7.jar:3.0.7]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
> [apache-cassandra-3.0.7.jar:3.0.7]
> {code}
> `nodetool status -r` :
> {code}
> Datacenter: One
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective)
> Host ID Rack
> DN cassandra1.dc1.mydomain.com 10.38 MB 256 6.8%
> 7470d016-9a45-4e00-819a-77d7e09a14a2 1r1
> UN cassandra2.dc1.mydomain.com 8.64 MB 256 7.3%
> cb93240d-b1c6-47f0-a1bb-59e4ae127a1f 1r2
> UN cassandra3.dc1.mydomain.com 11.32 MB 256 7.6%
> ff6b3342-8142-42ba-8dd0-da00cd4ae95f 1r3
> UN cassandra4.dc1.mydomain.com 12.46 MB 256 7.2%
> 91fad227-b394-4e25-be65-0f34a9dbbf9b 1r4
> UN cassandra5.dc1.mydomain.com 12.03 MB 256 8.4%
> 74d98f17-df0b-40f2-b23b-7c6e5f49c2d7 1r5
> Datacenter: Two
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective)
> Host ID Rack
> UN cassandra1.dc2.mydomain.com 10.39 MB 256 7.8%
> f49efc68-d530-4074-912a-b008f578c9d0 2r1
> UN cassandra2.dc2.mydomain.com 8.23 MB 256 8.5%
> b339a66e-4ef7-43c2-9507-9ac23dd7ad5c 2r2
> UN cassandra3.dc2.mydomain.com 10.34 MB 256 7.2%
> 28d51ab8-5ee2-41a7-9e93-247fdf9f6d85 2r3
> Datacenter: Three
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective)
> Host ID Rack
> UN cassandra1.dc3.mydomain.com 9.47 MB 256 7.2%
> bbd06f32-6d40-49f4-b71c-30227aac20f1 3r1
> UN cassandra2.dc3.mydomain.com 9.88 MB 256 7.7%
> 2789cffd-db20-47b9-962e-193326660345 3r2
> UN cassandra3.dc3.mydomain.com 11.36 MB 256 8.5%
> 9a11ad49-112b-4b43-b937-f5e12176d725 3r3
> UN cassandra4.dc3.mydomain.com 11.77 MB 256 7.6%
> 1009f985-2229-45c6-88c5-64ee508c4c3c 3r4
> UN cassandra5.dc3.mydomain.com 11.11 MB 256 7.9%
> 4cbac3e8-c412-4375-ba2b-354a0bd81df8 3r5
> {code}
> `nodetool describecluster` :
> {code}
> Cluster Information:
> Name: my_cluster
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> c478ac2f-c773-370c-aeca-d1a7169ad092: [10.xxx.xxx.xxx]
> 35f98fc6-3ddc-3358-9c92-d5a251ebc844: [10.xxx.xxx.xxx]
> a1573012-90a1-303f-81af-2ddc387cfc98: [10.xxx.xxx.xxx]
> c4a86820-60ea-371e-a24c-31b2040d18f1: [10.xxx.xxx.xxx]
> 1a734c68-c72f-3f0e-ac51-6fadc7854447: [10.xxx.xxx.xxx]
> 5042d7d8-c1d2-334c-95ce-443260401940: [10.xxx.xxx.xxx]
> dfc67ce1-5422-30e8-a533-9c2f0c2f7ad9: [10.xxx.xxx.xxx]
> 0f32b476-0e6f-3064-8795-5d8adc2b3704: [10.xxx.xxx.xxx]
> 31b66ee1-9447-39ff-9953-bad4b01ba87b: [10.xxx.xxx.xxx]
> 7bb3cee9-eef5-356a-b435-9500550fda00: [10.xxx.xxx.xxx]
> 6adcfe50-2a16-3bc5-93d0-006481c6217e: [10.xxx.xxx.xxx]
> 5bb7c619-3e64-3ae0-b50e-8a6b5af78b1a: [10.xxx.xxx.xxx]
> UNREACHABLE: [10.xxx.xxx.xxx]
> {code}
> Like the other cluster, this cluster has a corrupted system_schema. Partial
> output from "select * from system_schema.keyspaces" :
> {code}
> keyspace_name | durable_writes | replication
> ----------------------------------------------+----------------+----------------------------------------------------------------------------------------------------------
> system_auth | True | {'One': '5',
> 'Two': '3', 'Three': '5', 'class':
> 'org.apache.cassandra.locator.NetworkTopologyStrategy'}
> system_schema | True |
> {'class':
> 'org.apache.cassandra.locator.LocalStrategy'}
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 | False |
>
> {}
> {code}
> The cluster is still up and is able to take reads and writes. In cqlsh's
> `desc keyspaces` I see an additional keyspace that pretends to be an empty
> string :
> {code}
> cassandra@cqlsh> desc keyspaces;
> system_distributed
> system_schema system system_traces
> "" system_auth
> {code}
> Very curious.
> I left the second cluster in this state so that I can answer questions here
> on Jira, if needed.
> This issue can potentially destroy a cluster, so I'm marking this as
> critical. The fix for broken nodes seems to be to run my fix.py against every
> node and against every table in system_schema, after running a scrub on those
> same nodes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)