Jackson Chung created CASSANDRA-5924:
----------------------------------------

             Summary: If migration (upgrade) failed mid-way, some data will be 
"lost" on the upgraded instance
                 Key: CASSANDRA-5924
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5924
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jackson Chung


When upgrading from 1.0 to 1.1, C* checks from the system keyspace 
(schema_keyspaces) to see if a migration is needed.

When it is needed, it proceeds with migrate migrateSSTables.

But this process does not have any particular order (File.listFiles() has no 
guarantee order), and IOException can be thrown (eg fail to create directory).

In some of our upgrades, system was migrated first, followed by some KSs/CFs, 
but before it finishes all the KSs/CFs, it failed on a custom directory, with 
files in this directory that similar to sstables file convention (contains 
"-"). 

They really shouldn't be there and we are removing them. But this results in C* 
tried to create directory for this file, but it fails, because of 
ownership/permission, with IOException. As a result C* failed to start.

Without knowing why C* failed to start to begin with, C* was restarted. Only 
this time C* does not think it needs to migrate any more (system already 
migrated, so schema_keyspaces exists). This results in the those remaining 
KS/CF failed to be migrated.

Our root cause is because of the custom directory and the ownership/permission 
of it, and again we are removing them to re-upgrade. But the purpose of this 
jira is IOException (or any other exception) can still be thrown for various 
reasons during this process, and can result in the same problem: some CF failed 
to be migrated.

1.2 seems to have some handling codes, but it looks like a RuntimeException 
would still be thrown, and that would still be caught by the 
AbstractCassandraDaemon (or CassandraDaemon if 1.2) :

{code}
        catch (Throwable e)
        {
            logger.error("Exception encountered during startup", e);

            // try to warn user on stdout too, if we haven't already detached
            e.printStackTrace();
            System.out.println("Exception encountered during startup: " + 
e.getMessage());

            System.exit(3);
        }
{code}

And so I think this problem still exists in 1.2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to