[Cassandra Wiki] Update of "Operations" by Chris Goffin et

Apache Wiki Sun, 03 Jan 2010 17:57:03 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "Operations" page has been changed by Chris Goffinet.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=18&rev2=19

--------------------------------------------------

  === Handling failure ===
  If a node goes down and comes back up, the ordinary repair mechanisms will be 
adequate to deal with any inconsistent data.  If a node goes down entirely, 
then you have two options:
  
+  1. (Recommended approach) Run `nodeprobe removetoken` on all live nodes. You 
will need to supply the token of the dead node. You can obtain this by running 
`nodeprobe ring` on any live node to find the token (Unless there was some kind 
of outage, and the others came up but not the down one).
+   
-  1. Bring up a replacement node with the same IP and Token as the old, and 
run `nodeprobe repair`.  Until the repair process is complete, clients reading 
only from this node may get no data back.  Using a higher !ConsistencyLevel on 
reads will avoid this.
-   * If you don't know the Token of the old node, you can retrieve it from any 
of the other nodes' `system` keyspace, !ColumnFamily `LocationInfo`, key `L`.
-   * You can also run  `nodeprobe ring `to lookup a node's token (Unless there 
was some kind of outage, and the others came up but not the down one).
-  1. Remove the old token ring entry with `nodeprobe removetoken`
-   * optionally, bootstrap a new node at either the old node's location (using 
the InitialToken configuration directive) or at an automatically determined 
one.  Since a bootstrapping node does not advertise itself as available for 
reads until it has all the data for its ranges transferred, this avoids the 
problem of clients reading at !ConsistencyLevel.ONE seeing empty replies.  This 
may also be more performant than using the `nodeprobe repair` approach; testing 
needed.
  
- Do not leave the old node permanently in the token ring as "Down;" when it is 
in this state the cluster thinks it may eventually come back up with its old 
data, and will not re-replicate the data it was responsible for elsewhere.
+  Next, bring up the replacement node with a new IP address, and 
!AutoBootstrap set to true in storage-conf.xml. This will place the replacement 
node in the cluster and find the appropriate position automatically. Then the 
bootstrap process begins. While this process runs, the node will not receive 
reads until finished. 
+ 
+  1. (Advanced approach) Bring up a replacement node with the same IP and 
token as the old, and run `nodeprobe repair`. Until the repair process is 
complete, clients reading only from this node may get no data back.  Using a 
higher !ConsistencyLevel on reads will avoid this. You can obtain the old token 
by running `nodeprobe ring` on any live node to find the token (Unless there 
was some kind of outage, and the others came up but not the down one).
+ 
+ The reason why you run `nodeprobe removetoken` on all live nodes is so that 
the Hinted Handoff can stop collecting writes for the failed node.
  
  == Backing up data ==
  Cassandra can snapshot data while online using `nodeprobe snapshot`.  You can 
then back up those snapshots using any desired system, although leaving them 
where they are is probably the option that makes the most sense on large 
clusters.

[Cassandra Wiki] Update of "Operations" by Chris Goffin et

Reply via email to