[Cassandra Wiki] Update of "LiveSchemaUpdates" by gdusb abek

Apache Wiki Wed, 07 Apr 2010 06:00:01 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "LiveSchemaUpdates" page has been changed by gdusbabek.
The comment on this change is: Reworking.
http://wiki.apache.org/cassandra/LiveSchemaUpdates?action=diff&rev1=2&rev2=3

--------------------------------------------------

  
  = Modifying Schema on a Live Cluster =
  
+ == Under the Hood ==
- == Client Operations ==
- Column family operations: add, drop, rename.
- 
- Keyspace operations: add, drop, rename.  
- 
- These are all executed via the Thrift interface.  It is expected that you 
have ALL access if you are using security.
- 
- === How it works ===
- A new system table called `definitions` keeps track of two things: keyspace 
definitions (`SCHEMA_CF`) and keyspace changes (MIGRATIONS_CF).  TimeUUIDs are 
used throughout to match migrations up with schema and vice-versa.
+ A new system table called `definitions` keeps track of two things: keyspace 
definitions (`SCHEMA_CF`) and keyspace changes (`MIGRATIONS_CF`).  TimeUUIDs 
are used throughout to match migrations up with schema and vice-versa.
  
  === Keyspace Definitions (SCHEMA_CF) ===
- All current keyspace definitions are stored in a single row, one keyspace 
definition per column with a TimeUUID as the row key (also servers as version 
identifier), keyspace name as column name, and definition serialization as the 
column value.  There exists a special row, keyed by "Last Migration" that 
contains a single column indicating the current schema version UUID.  This 
makes it easy to look up the version and then retrieve it.
+ The current set of keyspace definitions are stored in a single row, one 
keyspace per column with a TimeUUID as the row key (also serves as version 
identifier), keyspace name as column name, and definition serialization as the 
column value.  There exists a special row, keyed by `"Last Migration"` that 
contains a single column indicating the current schema version UUID.  This 
makes it easy to look up the version and then retrieve it.
  
  === Migrations (MIGRATIONS_CF) ===
- MIGRATIONS_CF tracks the individual modifications that are made to the 
schema.  It consists of a single row keyed by "Migrations Key" with one column 
per migration.  Each column has the migration version UUID as its name, with 
the serialized migration as its value.
+ `MIGRATIONS_CF` tracks the individual modifications (add, drop, rename) that 
are made to the schema.  It consists of a single row keyed by `"Migrations 
Key"` with one column per migration.  Each column has the migration version 
UUID as its name, with the serialized migration as its value.
  
- == Updating ==
+ == Operations ==
+ 
+ === Client Side ===
+  * Add column family or keyspace
+  * Drop column family or keyspace
+  * Rename column family or keyspace
+ 
+ These are all executed via the Thrift interface.  It is expected that you 
have `ALL` access if you are using security.  For rename and drop operations 
the client will block until all associated files are renamed or deleted.
+ 
+ === Server Side ===
  Applying a migration consists of the following steps:
- 1. Generate the migration, which includes a new version UUID.
+  1. Generate the migration, which includes a new version UUID.
- 2. Update SCHEMA_CF with a new schema row.
+  2. Update `SCHEMA_CF` with a new schema row.
- 3. Update MIGRATION_CF by appending a migration column.
+  3. Update `MIGRATION_CF` by appending a migration column.
- 4. Update the "Last Migration" row in SCHEMA_CF.
+  4. Update the `"Last Migration"` row in `SCHEMA_CF`.
- 5. Flush the definitions table.
+  5. Flush the definitions table.
- 6. Update runtime data structures (create directories, etc.)
+  6. Update runtime data structures (create directories, etc.)
  
- == Starting Up ==
- When a node starts up, it checks SCHEMA_CF to find out the latest schema 
version it has.  If it finds nothing (as would happen with a new cluster), it 
loads nothing and logs a warning.  Otherwise, it uses the uuid it just read in 
to locate the right row in SCHEMA_CF and loads it.  That row is deserialized 
into one or more keyspace definitions which are then loaded in a manner similar 
to the load-from-xml approach used in the past.
+ === Handling Failure ===
+ A node can fail during any step of the update process.  Here is an 
examination of what will happen if a node fails after each part of the update 
process.
+  1. Nothing has been applied. Update fails outright.
+  2. Extra data exists in SCHEMA_CF but will be ignored because "Last 
Migration" was not updated.
+  3. Extra data exists in SCHEMA_CF and MIGRATION_CF but will be ignored 
because "Last Migration" was not updated.
+  4. Broken: commit log will not be replayed until *after* schemas are loaded 
on restart.  This means that the "Last Migration" will be read, but will not be 
able to be loaded and applied.
+  5. Startup will happen normally.
+  6. Startup will happen normally.
  
- At the same time, the node incorporates its schema version into the gossip 
digests it sends to other nodes.  It may be the case that this node does not 
have the latest schema definitions (as a result of network partition, 
bootstrapping a new node, or any other reason you can think of).  When a 
version mismatch is detected the definition promulgation mechanism described 
next is invoked.
+ === Starting Up ===
+ When a node starts up, it checks `SCHEMA_CF` to find out the latest schema 
version it has.  If it finds nothing (as would happen with a new cluster), it 
loads nothing and logs a warning.  Otherwise, it uses the uuid it just read in 
to load the correct row from `SCHEMA_CF`.  That row is deserialized into one or 
more keyspace definitions which are then loaded in a manner similar to the 
load-from-xml approach used in the past.
  
+ At the same time, the node incorporates its schema version UUID into the 
gossip digests it sends to other nodes.  It may be the case that this node does 
not have the latest schema definitions (as a result of network partition, 
bootstrapping a new node, or any other reason you can think of).  When a 
version mismatch is detected the definition promulgation mechanism described 
next is invoked.
- == Definition Promulgation ==
- Definition promulgation consists of two phases: 'announce' and 'push'. 
'announce' is a way for node A to declare to node B 'this is the schema version 
I have'.  If the versions are equal, the message is ignored.  If A is newer, B 
responds with an 'announce' to A (this functions as a request for updates).  If 
A is older, B responds with an 'push' containing all the migrations from B that 
A doesn't have.  
  
- When a schema update originates from the client (Thrift), gossip promulgation 
is bypassed and this announce-announce-push approach to push migrations to 
other nodes.
+ === Definition Promulgation ===
+ Definition promulgation consists of two phases: ''announce'' and ''push''. 
''announce'' is a way for node A to declare to node B "this is the schema 
version I have".  If the versions are equal, the message is ignored.  If A is 
newer, B responds with an ''announce'' to A (this functions as a request for 
updates).  If A is older, B responds with a ''push'' containing all the 
migrations from B that A doesn't have.  
  
+ When a schema update originates from the client (Thrift), gossip promulgation 
is bypassed and this ''announce-announce-push'' approach is used to push 
migrations to other nodes.
+ 
+ === New Cluster ===
+ === Existing Cluster ===
+ 
- == Concurrency ==
+ === Concurrency ===
- It is entirely possible and expected that a node will receive migration 
pushes from multiple nodes.  Because of this, all migrations are applied on a 
single-threaded stage and versions are checked throughout to make sure that a) 
no migration is applied twice, and 2) migrations are not applied out of sync.
+ It is entirely possible and expected that a node will receive migration 
pushes from multiple nodes.  Because of this, all migrations are applied on a 
single-threaded stage and versions are checked throughout to make sure that no 
migration is applied twice, and no migration is applied out of sync.
  
  Each migration knows the version UUID of the migration that immediately 
precedes it.  If a node is asked to apply a migration and its current version 
UUID does not match the last version UUID of the migration, the migration is 
discarded.
  
- One weakness of this model is that it is vulnerable if a new update starts 
before another update is promulgated to all live nodes--only one migration can 
be active within a cluster at any time.  To this we say: don't be stupid; plan 
and execute your migrations carefully.
+ One weakness of this model is that it is vulnerable if a new update starts 
before another update is promulgated to all live nodes--only one migration can 
be active within a cluster at any time.  One way to get around this is to 
choose one node and only initiate migrations through it. 
  
- == Failure Scenarios ==
- A node can fail during any step of the update process.  Here is an 
examination of what will happen if a node fails after each part of the update 
process (described earlier).
- 1. Nothing has been applied. Update fails outright.
- 2. Extra data exists in SCHEMA_CF but will be ignored because "Last 
Migration" was not updated.
- 3. Extra data exists in SCHEMA_CF and MIGRATION_CF but will be ignored 
because "Last Migration" was not updated.
- 4. Broken: commit log will not be replayed until *after* schemas are loaded 
on restart.  This means that the "Last Migration" will be read, but will not be 
able to be loaded and applied.
- 5. Startup will happen normally.
- 6. Startup will happen normally. 
- 
- == Under the Hood ==
- 
- 
- == Special Cases ==
- === New Cluster ===
-

[Cassandra Wiki] Update of "LiveSchemaUpdates" by gdusb abek

Reply via email to