[Cassandra Wiki] Update of "Operations" by scott white

Apache Wiki Mon, 08 Feb 2010 16:25:28 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.


The "Operations" page has been changed by scott white.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=28&rev2=29

--------------------------------------------------

  The following applies to Cassandra 0.5.
  
  == Hardware ==
- 
- See [[CassandraHardware]]
+ See CassandraHardware
  
  == Tuning ==
- 
- See [[PerformanceTuning]]
+ See PerformanceTuning
  
  == Ring management ==
  Each Cassandra server [node] is assigned a unique Token that determines what 
keys it is the primary replica for.  If you sort all nodes' Tokens, the Range 
of keys each is responsible for is (!PreviousToken, !MyToken], that is, from 
the previous token (exclusive) to the node's token (inclusive).  The machine 
with the lowest Token gets both all keys less than that token, and all keys 
greater than the largest Token; this is called a "wrapping Range."
@@ -31, +29 @@

   * !RackAwareStrategy: replica 2 is is placed in the first node along the 
ring the belongs in '''another''' data center than the first; the remaining N-2 
replicas, if any, are placed on the first nodes along the ring in the 
'''same''' rack as the first
  
  Note that with !RackAwareStrategy, succeeding nodes along the ring should 
alternate data centers to avoid hot spots.  For instance, if you have nodes A, 
B, C, and D in increasing Token order, and instead of alternating you place A 
and B in DC1, and C and D in DC2, then nodes C and A will have 
disproportionately more data on them because they will be the replica 
destination for every Token range in the other data center.
+ 
   * The corollary to this is, if you want to start with a single DC and add 
another later, when you add the second DC you should add as many nodes as you 
have in the first rather than adding a node or two at a time gradually.
  
  Replication strategy is not intended to be changed once live, but if you are 
sufficiently motivated it can be done with some manual effort:
+ 
   1. anticompact each node's primary Range, yielding sstables containing only 
that Range data
   1. copy those sstables to the nodes responsible for extra replicas under the 
new strategy
   1. change the strategy and restart
@@ -41, +41 @@

  Replication factor is not really intended to be changed in a live cluster 
either, but increasing it may be done if you (a) use ConsistencyLevel.QUORUM or 
ALL (depending on your existing replication factor) to make sure that a replica 
that actually has the data is consulted, (b) are willing to accept downtime 
while anti-entropy repair runs (see below), or (c) are willing to live with 
some clients potentially being told no data exists if they read from the new 
replica location(s) until repair is done.
  
  Reducing replication factor is easily done and only requires running cleanup 
afterwards to remove extra replicas.
-  
+ 
  === Network topology ===
- 
  Besides datacenters, you can also tell Cassandra which nodes are in the same 
rack within a datacenter.  Cassandra will use this to route both reads and data 
movement for Range changes to the nearest replicas.  This is configured by a 
user-pluggable !EndpointSnitch class in the configuration file.
  
  !EndpointSnitch is related to, but distinct from, replication strategy 
itself: !RackAwareStrategy needs a properly configured Snitch to places 
replicas correctly, but even absent a Strategy that cares about datacenters, 
the rest of Cassandra will still be location-sensitive.
@@ -51, +50 @@

  There is an example of a custom Snitch implementation in 
https://svn.apache.org/repos/asf/incubator/cassandra/trunk/contrib/property_snitch/.
  
  == Range changes ==
- 
  === Bootstrap ===
  Adding new nodes is called "bootstrapping."
  
@@ -62, +60 @@

  Important things to note:
  
   1. You should wait long enough for all the nodes in your cluster to become 
aware of the bootstrapping node via gossip before starting another bootstrap.  
For most clusters 30s will be plenty of time.
-  1. Automatically picking a Token only allows doubling your cluster size at 
once; for more than that, let the first group finish before starting another.
+  1. Relating to point 1, one can only boostrap N nodes at a time with 
automatic token picking, where N is the size of the existing cluster. If you 
need to more than double the size of your cluster, you have to wait for the 
first N nodes to finish until your cluster is size 2N before bootstrapping more 
nodes. So if your current cluster is 5 nodes and you want add 7 nodes, 
bootstrap 5 and let those finish before boostrapping the last two.
   1. As a safety measure, Cassandra does not automatically remove data from 
nodes that "lose" part of their Token Range to a newly added node.  Run 
"nodetool cleanup" on the source node(s) when you are satisfied the new node is 
up and working. If you do not do this the old data will still be counted 
against the load on that node and future bootstrap attempts at choosing a 
location will be thrown off.
  
  Cassandra is smart enough to transfer data from the nearest source node(s), 
if your !EndpointSnitch is configured correctly.  So, the new node doesn't need 
to be in the same datacenter as the primary replica for the Range it is 
bootstrapping into, as long as another replica is in the datacenter with the 
new one.
@@ -93, +91 @@

  If a node goes down entirely, then you have two options:
  
   1. (Recommended approach) Bring up the replacement node with a new IP 
address, and !AutoBootstrap set to true in storage-conf.xml. This will place 
the replacement node in the cluster and find the appropriate position 
automatically. Then the bootstrap process begins. While this process runs, the 
node will not receive reads until finished. Once this process is finished on 
the replacement node, run `nodetool removetoken` once, suppling the token of 
the dead node, and `nodetool cleanup` on each node.
-  * You can obtain the dead node's token by running `nodetool ring` on any 
live node, unless there was some kind of outage, and the others came up but not 
the down one -- in that case, you can retrieve the token from the live nodes' 
system tables.
+  1. You can obtain the dead node's token by running `nodetool ring` on any 
live node, unless there was some kind of outage, and the others came up but not 
the down one -- in that case, you can retrieve the token from the live nodes' 
system tables.
  
-  1. (Alternative approach) Bring up a replacement node with the same IP and 
token as the old, and run `nodetool repair`. Until the repair process is 
complete, clients reading only from this node may get no data back.  Using a 
higher !ConsistencyLevel on reads will avoid this. 
+  1. (Alternative approach) Bring up a replacement node with the same IP and 
token as the old, and run `nodetool repair`. Until the repair process is 
complete, clients reading only from this node may get no data back.  Using a 
higher !ConsistencyLevel on reads will avoid this.
  
  The reason why you run `nodetool cleanup` on all live nodes is to remove old 
Hinted Handoff writes stored for the dead node.
  
@@ -112, +110 @@

  {{{
  Usage: sstable2json [-f outfile] <sstable> [-k key [-k key [...]]]
  }}}
- 
  `bin/sstable2json` accepts as a required argument, the full path to an 
SSTable data file, (files ending in -Data.db), and an optional argument for an 
output file (by default, output is written to stdout). You can also pass the 
names of specific keys using the `-k` argument to limit what is exported.
  
  Note: If you are not running the exporter on in-place SSTables, there are a 
couple of things to keep in mind.
+ 
   1. The corresponding configuration must be present (same as it would be to 
run a node).
-  2. SSTables are expected to be in a directory named for the keyspace (same 
as they would be on a production node).
+  1. SSTables are expected to be in a directory named for the keyspace (same 
as they would be on a production node).
  
  JSON exported SSTables can be "imported" to create new SSTables using 
`bin/json2sstable`:
  
  {{{
  Usage: json2sstable -K keyspace -c column_family <json> <sstable>
  }}}
- 
  `bin/json2sstable` takes arguments for keyspace and column family names, and 
full paths for the JSON input file and the destination SSTable file name.
  
  You can also import pre-serialized rows of data using the BinaryMemtable 
interface.  This is useful for importing via Hadoop or another source where you 
want to do some preprocessing of the data to import.
@@ -136, +133 @@

  
  Important metrics to watch on a per-Column Family basis would be: '''Read 
Count, Read Latency, Write Count and Write Latency'''. '''Pending Tasks''' tell 
you if things are backing up. These metrics can also be exposed using any JMX 
client such as `jconsole`
  
- You can also use jconsole, and the MBeans tab to look at PendingTasks for 
thread pools. If you see one particular thread backing up, this can give you an 
indication of a problem. One example would be ROW-MUTATION-STAGE indicating 
that write requests are arriving faster than they can be handled. A more subtle 
example is the FLUSH stages: if these start backing up, cassandra is accepting 
writes into memory fast enough, but the sort-and-write-to-disk stages are 
falling behind. 
+ You can also use jconsole, and the MBeans tab to look at PendingTasks for 
thread pools. If you see one particular thread backing up, this can give you an 
indication of a problem. One example would be ROW-MUTATION-STAGE indicating 
that write requests are arriving faster than they can be handled. A more subtle 
example is the FLUSH stages: if these start backing up, cassandra is accepting 
writes into memory fast enough, but the sort-and-write-to-disk stages are 
falling behind.
  
  If you are seeing a lot of tasks being built up, your hardware or 
configuration tuning is probably the bottleneck.

[Cassandra Wiki] Update of "Operations" by scott white

Reply via email to