[ 
https://issues.apache.org/jira/browse/CASSANDRA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168969#comment-13168969
 ] 

Jonathan Ellis commented on CASSANDRA-1391:
-------------------------------------------

Thanks, Pavel.  This is getting closer.  But I think continuing to use UUIDs is 
the wrong approach.  In particular, code like this means we've failed to 
achieve our goal:

{code}
.       if (newVersion.timestamp() <= lastVersion.timestamp())
            throw new ConfigurationException("New version timestamp is not 
newer than the current version timestamp.");
{code}

If two migrations X and Y propagate through the cluster concurrently from 
different coordinators, some nodes will apply X first, some Y; whichever 
migration has a lower timestamp will then error out on the remaining nodes and 
we'll end up with the same kind of version conflict snafu we encounter now.

Here's how I think it should work:

* Coordinator turns KsDef and CfDef objects into RowMutations by applying them 
to the existing (local) schema.  Maybe you use something like your 
attributesToCheck code since you already have that written.  Give that mutation 
a normal local timestamp (FBU.timestampMicros).

Then each node applying the change:
* makes a deep copy of the existing schema ColumnFamily objects
* calls Table.apply on the migration RowMutations
* calls ColumnFamily.diff on the new schema ColumnFamily object vs the copied 
one.  (This is where I was going above by saying "let the existing resolve code 
do the work."  No matter which order nodes apply X and Y in, they will always 
agree on the result after applying both.  Note that this does not depend on X 
and Y getting "correctly" ordered timestamps, either.)
* makes the appropriate Table + CFS + Schema changes dicated by the diff
* (above obvously needs to be synchronized at least against the Table/CFS 
objects affected)

Schema "version" may then be computed as an md5 of the Schema objects.  (Again: 
goal is that nodes can apply X and Y in any order, and we don't care.  So 
version needs to be entirely content-based, not time-based.)  Probably the 
easiest way to do this is to just use CF.updateDigest.  We can cut this down to 
the first 16 bytes if we need to cram it into a UUID, but I don't see a reason 
for that (the Thrift API uses Strings already).

Nit: flushSystemCFs could use FBUtilities.waitOnFutures(flushes) instead of 
rolling its own multi-future wait.

                
> Allow Concurrent Schema Migrations
> ----------------------------------
>
>                 Key: CASSANDRA-1391
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1391
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Stu Hood
>            Assignee: Pavel Yaskevich
>             Fix For: 1.1
>
>         Attachments: 
> 0001-new-migration-schema-and-avro-methods-cleanup.patch, 
> 0002-avro-removal.patch, CASSANDRA-1391.patch
>
>
> CASSANDRA-1292 fixed multiple migrations started from the same node to 
> properly queue themselves, but it is still possible for migrations initiated 
> on different nodes to conflict and leave the cluster in a bad state. Since 
> the system_add/drop/rename methods are accessible directly from the client 
> API, they should be completely safe for concurrent use.
> It should be possible to allow for most types of concurrent migrations by 
> converting the UUID schema ID into a VersionVectorClock (as provided by 
> CASSANDRA-580).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to