Jackson Chung created CASSANDRA-5924:
----------------------------------------
Summary: If migration (upgrade) failed mid-way, some data will be
"lost" on the upgraded instance
Key: CASSANDRA-5924
URL: https://issues.apache.org/jira/browse/CASSANDRA-5924
Project: Cassandra
Issue Type: Bug
Reporter: Jackson Chung
When upgrading from 1.0 to 1.1, C* checks from the system keyspace
(schema_keyspaces) to see if a migration is needed.
When it is needed, it proceeds with migrate migrateSSTables.
But this process does not have any particular order (File.listFiles() has no
guarantee order), and IOException can be thrown (eg fail to create directory).
In some of our upgrades, system was migrated first, followed by some KSs/CFs,
but before it finishes all the KSs/CFs, it failed on a custom directory, with
files in this directory that similar to sstables file convention (contains
"-").
They really shouldn't be there and we are removing them. But this results in C*
tried to create directory for this file, but it fails, because of
ownership/permission, with IOException. As a result C* failed to start.
Without knowing why C* failed to start to begin with, C* was restarted. Only
this time C* does not think it needs to migrate any more (system already
migrated, so schema_keyspaces exists). This results in the those remaining
KS/CF failed to be migrated.
Our root cause is because of the custom directory and the ownership/permission
of it, and again we are removing them to re-upgrade. But the purpose of this
jira is IOException (or any other exception) can still be thrown for various
reasons during this process, and can result in the same problem: some CF failed
to be migrated.
1.2 seems to have some handling codes, but it looks like a RuntimeException
would still be thrown, and that would still be caught by the
AbstractCassandraDaemon (or CassandraDaemon if 1.2) :
{code}
catch (Throwable e)
{
logger.error("Exception encountered during startup", e);
// try to warn user on stdout too, if we haven't already detached
e.printStackTrace();
System.out.println("Exception encountered during startup: " +
e.getMessage());
System.exit(3);
}
{code}
And so I think this problem still exists in 1.2
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira