[Cassandra Wiki] Update of "Operations" by JonathanElli s

Apache Wiki Sun, 03 Jan 2010 17:29:39 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "Operations" page has been changed by JonathanEllis.
The comment on this change is: explain repair/bootstrap options more clearly.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=17&rev2=18

--------------------------------------------------

   1. Anti-Entropy: when `nodeprobe repair` is run, Cassandra performs a major 
compaction, computes a Merkle Tree of the data on that node, and compares it 
with the versions on other replicas, to catch any out of sync data that hasn't 
been read recently.  This is intended to be run infrequently (e.g., weekly) 
since major compaction is relatively expensive.
  
  === Handling failure ===
- If a node goes down and comes back up, the ordinary repair mechanisms will be 
adequate to deal with any inconsistent data.  If a node goes down entirely, you 
should be aware of the following as well:
+ If a node goes down and comes back up, the ordinary repair mechanisms will be 
adequate to deal with any inconsistent data.  If a node goes down entirely, 
then you have two options:
  
-  1. Remove the old node from the ring first, or bring up a replacement node 
with the same IP and Token as the old; otherwise, the old node will stay part 
of the ring in a "down" state, which will degrade your replication factor for 
the affected Range
+  1. Bring up a replacement node with the same IP and Token as the old, and 
run `nodeprobe repair`.  Until the repair process is complete, clients reading 
only from this node may get no data back.  Using a higher !ConsistencyLevel on 
reads will avoid this.
    * If you don't know the Token of the old node, you can retrieve it from any 
of the other nodes' `system` keyspace, !ColumnFamily `LocationInfo`, key `L`.
    * You can also run  `nodeprobe ring `to lookup a node's token (Unless there 
was some kind of outage, and the others came up but not the down one).
-  1. Removing the old node, then bootstrapping the new one, may be more 
performant than using Anti-Entropy.  Testing needed.
-   * Even brute-force rsyncing of data from the relevant replicas and running 
cleanup on the replacement node may be more performant
+  1. Remove the old token ring entry with `nodeprobe removetoken`
+   * optionally, bootstrap a new node at either the old node's location (using 
the InitialToken configuration directive) or at an automatically determined 
one.  Since a bootstrapping node does not advertise itself as available for 
reads until it has all the data for its ranges transferred, this avoids the 
problem of clients reading at !ConsistencyLevel.ONE seeing empty replies.  This 
may also be more performant than using the `nodeprobe repair` approach; testing 
needed.
+ 
+ Do not leave the old node permanently in the token ring as "Down;" when it is 
in this state the cluster thinks it may eventually come back up with its old 
data, and will not re-replicate the data it was responsible for elsewhere.
  
  == Backing up data ==
  Cassandra can snapshot data while online using `nodeprobe snapshot`.  You can 
then back up those snapshots using any desired system, although leaving them 
where they are is probably the option that makes the most sense on large 
clusters.

[Cassandra Wiki] Update of "Operations" by JonathanElli s

Reply via email to