[
https://issues.apache.org/jira/browse/CASSANDRA-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Shuler resolved CASSANDRA-5924.
---------------------------------------
Resolution: Not a Problem
Closing as not a problem, due to unexpected data where data is expected. Feel
free to re-open with some concrete reproduction steps on the latest version of
1.2.x or 2.0.x, if you would like to pursue further. Thanks!
> If migration (upgrade) failed mid-way, some data will be "lost" on the
> upgraded instance
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-5924
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5924
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jackson Chung
>
> When upgrading from 1.0 to 1.1, C* checks from the system keyspace
> (schema_keyspaces) to see if a migration is needed.
> When it is needed, it proceeds with migrate migrateSSTables.
> But this process does not have any particular order (File.listFiles() has no
> guarantee order), and IOException can be thrown (eg fail to create directory).
> In some of our upgrades, system was migrated first, followed by some KSs/CFs,
> but before it finishes all the KSs/CFs, it failed on a custom directory, with
> files in this directory that similar to sstables file convention (contains
> "-").
> They really shouldn't be there and we are removing them. But this results in
> C* tried to create directory for this file, but it fails, because of
> ownership/permission, with IOException. As a result C* failed to start.
> Without knowing why C* failed to start to begin with, C* was restarted. Only
> this time C* does not think it needs to migrate any more (system already
> migrated, so schema_keyspaces exists). This results in the those remaining
> KS/CF failed to be migrated.
> Our root cause is because of the custom directory and the
> ownership/permission of it, and again we are removing them to re-upgrade. But
> the purpose of this jira is IOException (or any other exception) can still be
> thrown for various reasons during this process, and can result in the same
> problem: some CF failed to be migrated.
> 1.2 seems to have some handling codes, but it looks like a RuntimeException
> would still be thrown, and that would still be caught by the
> AbstractCassandraDaemon (or CassandraDaemon if 1.2) :
> {code}
> catch (Throwable e)
> {
> logger.error("Exception encountered during startup", e);
> // try to warn user on stdout too, if we haven't already detached
> e.printStackTrace();
> System.out.println("Exception encountered during startup: " +
> e.getMessage());
> System.exit(3);
> }
> {code}
> And so I think this problem still exists in 1.2
--
This message was sent by Atlassian JIRA
(v6.2#6252)