[ https://issues.apache.org/jira/browse/CASSANDRA-7582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077792#comment-14077792 ]
Benedict commented on CASSANDRA-7582: ------------------------------------- Right, and this hole is down to a CL bug we would likely have caught had we had this enabled previously. A correctly functioning drain would not permit this to happen. However this is really a non-issue, since the only plausibly affected table is going to be this system table we dropped, as otherwise there would have to be a table drop immediately prior to shutting down the node for upgrade. Especially since schema disagreement during upgrade is a no-no, this should not happen. Since drain does sometimes work, this should reduce the risk profile even further. Further, we can relatively safely prevent almost all power-failure exceptions by introducing the change I suggested, and ignoring any errors on CLS reading if the header hashes are consistent with the header's id (which we now have available to us), and this id is in the past, as this is obviously a recycled segment that had not yet had its header reset before a power failure. This leaves only those that managed to write only a partial block during a power failure, which will be dealt with by OS journalling (and should be impossible on SSDs anyway). So I don't think there are any power-off risk scenarios left to warn about. > 2.1 multi-dc upgrade errors > --------------------------- > > Key: CASSANDRA-7582 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7582 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Ryan McGuire > Assignee: Benedict > Priority: Critical > Fix For: 2.1.1 > > > Multi-dc upgrade [was working from 2.0 -> 2.1 fairly > recently|http://cassci.datastax.com/job/cassandra_upgrade_dtest/55/testReport/upgrade_through_versions_test/TestUpgrade_from_cassandra_2_0_latest_tag_to_cassandra_2_1_HEAD/], > but is currently failing. > Running > upgrade_through_versions_test.py:TestUpgrade_from_cassandra_2_0_HEAD_to_cassandra_2_1_HEAD.bootstrap_multidc_test > I get the following errors when starting 2.1 upgraded from 2.0: > {code} > ERROR [main] 2014-07-21 23:54:20,862 CommitLog.java:143 - Commit log replay > failed due to replaying a mutation for a missing table. This error can be > ignored by providing -Dcassandra.commitlog.stop_on_missing_tables=false on > the command line > ERROR [main] 2014-07-21 23:54:20,869 CassandraDaemon.java:474 - Exception > encountered during startup > java.lang.RuntimeException: > org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find > cfId=a1b676f3-0c5d-3276-bfd5-07cf43397004 > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:457) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:546) > [main/:na] > Caused by: org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't > find cfId=a1b676f3-0c5d-3276-bfd5-07cf43397004 > at > org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:164) > ~[main/:na] > at > org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:97) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:353) > ~[main/:na] > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:333) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:365) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:98) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:137) > ~[main/:na] > at > org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:115) > ~[main/:na] > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)