[ https://issues.apache.org/jira/browse/CASSANDRA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739427#comment-16739427 ]
Avraham Kalvo edited comment on CASSANDRA-14957 at 1/10/19 2:09 PM: -------------------------------------------------------------------- To be clear, here's the timeline of the incident: ``` 12:05:02 first node state jump to shutdown for restart 12:06:37 INFO Initializing tasks_scheduler_external.tasks (first node) 12:06:39 WARN UnknownColumnFamilyException reading from socket; closing (first node) ... 12:09:15 only trace of service migration running by issuing the following: `CREATE KEYSPACE IF NOT EXISTS tasks_scheduler_external WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'};` ... 12:09:31 last node started after restart ``` Notice *no tables* were attempted to be created throughout the restart, and also the keyspace wasn't recreated as it was already in existence. Hence - the new version of the table, as visible in the file system, *has nothing to do* with any explicit DDL running before, throughout and after the rolling restart. The schema (DDL) hasn't changed - just the data was split into a new version on the filesystem which eventually became the version the cluster agrees on once it has completed its rolling restart. Thank you. Avi. was (Author: via.vokal): To be clear, here's the timeline of the incident: 12:05:02 first node state jump to shutdown for restart 12:06:37 INFO Initializing tasks_scheduler_external.tasks (first node) 12:06:39 WARN UnknownColumnFamilyException reading from socket; closing (first node) ... 12:09:15 only trace of service migration running by issuing the following: `CREATE KEYSPACE IF NOT EXISTS tasks_scheduler_external WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'};` ... 12:09:31 last node started after restart Notice *no tables* were attempted to be created throughout the restart, and also the keyspace wasn't recreated as it was already in existence. Hence - the new version of the table, as visible in the file system, *has nothing to do* with any explicit DDL running before, throughout and after the rolling restart. The schema (DDL) hasn't changed - just the data was split into a new version on the filesystem which eventually became the version the cluster agrees on once it has completed its rolling restart. Thank you. Avi. > Rolling Restart Of Nodes Causes Dataloss Due To Schema Collision > ---------------------------------------------------------------- > > Key: CASSANDRA-14957 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14957 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema > Reporter: Avraham Kalvo > Priority: Major > > We were issuing a rolling restart on a mission-critical five node C* cluster. > The first node which was restarted got the following messages in its > system.log: > ``` > January 2nd 2019, 12:06:37.310 - INFO 12:06:35 Initializing > tasks_scheduler_external.tasks > ``` > ``` > WARN 12:06:39 UnknownColumnFamilyException reading from socket; closing > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find table for > cfId bd7200a0-1567-11e8-8974-855d74ee356f. If a table was just created, this > is likely due to the schema not being fully propagated. Please wait for > schema agreement on table creation. > at > org.apache.cassandra.config.CFMetaData$Serializer.deserialize(CFMetaData.java:1336) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:660) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:349) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:286) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) > ~[apache-cassandra-3.0.10.jar:3.0.10] > at > org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) > ~[apache-cassandra-3.0.10.jar:3.0.10] > ``` > The latter was then repeated several times across the cluster. > It was then found out that the table in question > `tasks_scheduler_external.tasks` was created with a new schema version > sometime along the entire cluster consecutive restart and became available > once the schema agreement settled, which started taking requests leaving the > previous version of the schema unavailable for any request, thus generating a > data loss to our online system. > Data loss was recovered by manually copying SSTables from the previous > version directory of the schema to the new one followed by `nodetool refresh` > to the relevant table. > The above has repeated itself for several tables across various keyspaces. > One other thing to mention is that a repair was in place for the first node > to be restarted, which was obviously stopped as the daemon was shut down, but > this doesn't seem to do with the above at first glance. > Seems somewhat related to: > https://issues.apache.org/jira/browse/CASSANDRA-13559 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org