[jira] [Updated] (CASSANDRA-12131) system_schema corruption causing nodes to not restart

Philip Thompson (JIRA) Tue, 05 Jul 2016 12:06:47 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-12131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Philip Thompson updated CASSANDRA-12131:
----------------------------------------
    Fix Version/s: 3.0.x

> system_schema corruption causing nodes to not restart
> -----------------------------------------------------
>
>                 Key: CASSANDRA-12131
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12131
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Tom van der Woerdt
>            Priority: Critical
>             Fix For: 3.0.x
>
>
> Symptoms :
> * Existing nodes fail to restart
> * system_schema has broken data
> * `nodetool describecluster` shows a full disagreement
> This happened on two clusters I manage, and so far I have no idea why. I'll 
> describe symptoms and info on what I did to (partially) resolve this. Hope 
> the actual bug can get fixed.
> All clusters run with the binary distribution from cassandra.apache.org. One 
> cluster runs on CentOS 6, the other CentOS 7, but both with Java 8u77. The 
> issue was seen on version 3.0.4 and during an upgrade from 3.0.6 to 3.0.7.
> ** Cluster 1 **
> Version: 3.0.4
> Hardware: 2 datacenters, 3 machines each
> Network: 1Gbit, <1ms within the dc, <20ms cross-dc
> This happened several months ago. I found out the hard way that every node 
> had a different schema_version when I tried to restart a node and it didn't 
> come back. Assuming it was just a single unhappy node, I ignored it and 
> restarted a second node (in a different datacenter) which also did not come 
> back.
> I like my quorums so I didn't restart the other nodes. `nodetool 
> describecluster` showed that every node had a different schema version. 
> Querying system_schema showed a lot of records with their keys set to 
> `\0\0\0\0(...)\0\0`. Cassandra logs indicated corrupted data, which was then 
> fixed by running scrub.
> Of course that didn't actually fix the data and using CQL I removed most of 
> the rows in system_schema that looked wrong. After doing that `nodetool 
> describecluster' agreed on a schema version again. I've attached the python 
> script I used to remove the records from the 'columns' table (fix.py), 
> similar scripts were used for other tables.
> That didn't actually remove all the records, some proved impossible to delete 
> :
> {code}
> # Partial output from the query "select * from system_schema.columns"
> |       regular |       -1 |                    text
>  system_distributed | 
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>  | 
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>  |             none |        exception_stacktrace 
> |       regular |       -1 |                    text
>  system_distributed | 
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>  |                                     
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 |             none |             
>     finished_at 
> |       regular |       -1 |               timestamp
>  system_distributed | 
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
>  |                             
> \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 |             none |     
>           keyspace_name
> {code}
> ... so I just left those there as it doesn't seem to impact the cluster other 
> than spewing this error every minute :
> {code}
> ERROR [CompactionExecutor:20] 2016-07-04 14:19:59,798 
> CassandraDaemon.java:201 - Exception in thread 
> Thread[CompactionExecutor:20,1,main]
> java.lang.AssertionError: Invalid clustering for the table: 
> org.apache.cassandra.db.Clustering$2@661b79a
>         at 
> org.apache.cassandra.db.Clustering$Serializer.serialize(Clustering.java:136) 
> ~[apache-cassandra-3.0.7.jar:3.0.7]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:159)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>         at 
> org.apache.cassandra.db.ColumnIndex$Builder.add(ColumnIndex.java:144) 
> ~[apache-cassandra-3.0.7.jar:3.0.7]
>         at 
> org.apache.cassandra.db.ColumnIndex$Builder.build(ColumnIndex.java:112) 
> ~[apache-cassandra-3.0.7.jar:3.0.7]
>         at 
> org.apache.cassandra.db.ColumnIndex.writeAndBuildIndex(ColumnIndex.java:52) 
> ~[apache-cassandra-3.0.7.jar:3.0.7]
> {code}
> The cluster works fine now, minus the phantom rows and minutely error on 
> every node. As for the two boxes that got killed, they were `removenode`d and 
> added back, somewhere in this process.
> ** Cluster 2 **
> Version: 3.0.6
> Hardware: 3 datacenters, 13 machines total
> Network: 1Gbit, <1ms within the dc, <50ms cross-dc
> This is a cluster I use for tests, which involves doing a lot of keyspace 
> changes. While doing a 3.0.6->3.0.7 upgrade this morning I noticed that the 
> first box I wanted to upgrade immediately didn't come back. Instead of trying 
> to fix it (or just rebuilding the cluster) I left it like that and am now 
> filing this report.
> Startup on this node fails with :
> {code}
> ERROR [main] 2016-07-04 09:58:44,306 CassandraDaemon.java:698 - Exception 
> encountered during startup
> java.lang.AssertionError: null
>     at 
> org.apache.cassandra.config.ColumnDefinition.<init>(ColumnDefinition.java:155)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.createColumnFromRow(SchemaKeyspace.java:1015)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.lambda$fetchColumns$12(SchemaKeyspace.java:995)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_77]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchColumns(SchemaKeyspace.java:995)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:949)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:928)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:891)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:868)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:856)
>  ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:136) 
> ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:126) 
> ~[apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:235) 
> [apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
>  [apache-cassandra-3.0.7.jar:3.0.7]
>     at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685) 
> [apache-cassandra-3.0.7.jar:3.0.7]
> {code}
> `nodetool status -r` :
> {code}
> Datacenter: One
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address                      Load       Tokens       Owns (effective)  
> Host ID                               Rack         
> DN  cassandra1.dc1.mydomain.com  10.38 MB   256          6.8%              
> 7470d016-9a45-4e00-819a-77d7e09a14a2  1r1
> UN  cassandra2.dc1.mydomain.com  8.64 MB    256          7.3%              
> cb93240d-b1c6-47f0-a1bb-59e4ae127a1f  1r2
> UN  cassandra3.dc1.mydomain.com  11.32 MB   256          7.6%              
> ff6b3342-8142-42ba-8dd0-da00cd4ae95f  1r3
> UN  cassandra4.dc1.mydomain.com  12.46 MB   256          7.2%              
> 91fad227-b394-4e25-be65-0f34a9dbbf9b  1r4
> UN  cassandra5.dc1.mydomain.com  12.03 MB   256          8.4%              
> 74d98f17-df0b-40f2-b23b-7c6e5f49c2d7  1r5
> Datacenter: Two
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address                      Load       Tokens       Owns (effective)  
> Host ID                               Rack         
> UN  cassandra1.dc2.mydomain.com  10.39 MB   256          7.8%              
> f49efc68-d530-4074-912a-b008f578c9d0  2r1
> UN  cassandra2.dc2.mydomain.com  8.23 MB    256          8.5%              
> b339a66e-4ef7-43c2-9507-9ac23dd7ad5c  2r2
> UN  cassandra3.dc2.mydomain.com  10.34 MB   256          7.2%              
> 28d51ab8-5ee2-41a7-9e93-247fdf9f6d85  2r3
> Datacenter: Three
> ================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address                      Load       Tokens       Owns (effective)  
> Host ID                               Rack         
> UN  cassandra1.dc3.mydomain.com  9.47 MB    256          7.2%              
> bbd06f32-6d40-49f4-b71c-30227aac20f1  3r1
> UN  cassandra2.dc3.mydomain.com  9.88 MB    256          7.7%              
> 2789cffd-db20-47b9-962e-193326660345  3r2
> UN  cassandra3.dc3.mydomain.com  11.36 MB   256          8.5%              
> 9a11ad49-112b-4b43-b937-f5e12176d725  3r3
> UN  cassandra4.dc3.mydomain.com  11.77 MB   256          7.6%              
> 1009f985-2229-45c6-88c5-64ee508c4c3c  3r4
> UN  cassandra5.dc3.mydomain.com  11.11 MB   256          7.9%              
> 4cbac3e8-c412-4375-ba2b-354a0bd81df8  3r5
> {code}
> `nodetool describecluster` :
> {code}
> Cluster Information:
>     Name: my_cluster
>     Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>     Schema versions:
>         c478ac2f-c773-370c-aeca-d1a7169ad092: [10.xxx.xxx.xxx]
>         35f98fc6-3ddc-3358-9c92-d5a251ebc844: [10.xxx.xxx.xxx]
>         a1573012-90a1-303f-81af-2ddc387cfc98: [10.xxx.xxx.xxx]
>         c4a86820-60ea-371e-a24c-31b2040d18f1: [10.xxx.xxx.xxx]
>         1a734c68-c72f-3f0e-ac51-6fadc7854447: [10.xxx.xxx.xxx]
>         5042d7d8-c1d2-334c-95ce-443260401940: [10.xxx.xxx.xxx]
>         dfc67ce1-5422-30e8-a533-9c2f0c2f7ad9: [10.xxx.xxx.xxx]
>         0f32b476-0e6f-3064-8795-5d8adc2b3704: [10.xxx.xxx.xxx]
>         31b66ee1-9447-39ff-9953-bad4b01ba87b: [10.xxx.xxx.xxx]
>         7bb3cee9-eef5-356a-b435-9500550fda00: [10.xxx.xxx.xxx]
>         6adcfe50-2a16-3bc5-93d0-006481c6217e: [10.xxx.xxx.xxx]
>         5bb7c619-3e64-3ae0-b50e-8a6b5af78b1a: [10.xxx.xxx.xxx]
>         UNREACHABLE: [10.xxx.xxx.xxx]
> {code}
> Like the other cluster, this cluster has a corrupted system_schema. Partial 
> output from "select * from system_schema.keyspaces" :
> {code}
>  keyspace_name                                | durable_writes | replication
> ----------------------------------------------+----------------+----------------------------------------------------------------------------------------------------------
>                                   system_auth |           True | {'One': '5', 
> 'Two': '3', 'Three': '5', 'class': 
> 'org.apache.cassandra.locator.NetworkTopologyStrategy'}
>                                 system_schema |           True |              
>                                     {'class': 
> 'org.apache.cassandra.locator.LocalStrategy'}
>  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 |          False |              
>                                                                               
>              {}
> {code}
> The cluster is still up and is able to take reads and writes. In cqlsh's 
> `desc keyspaces` I see an additional keyspace that pretends to be an empty 
> string :
> {code}
> cassandra@cqlsh> desc keyspaces;
> system_distributed
> system_schema                system               system_traces     
> ""                system_auth        
> {code}
> Very curious.
> I left the second cluster in this state so that I can answer questions here 
> on Jira, if needed.
> This issue can potentially destroy a cluster, so I'm marking this as 
> critical. The fix for broken nodes seems to be to run my fix.py against every 
> node and against every table in system_schema, after running a scrub on those 
> same nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-12131) system_schema corruption causing nodes to not restart

Reply via email to