Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "Operations" page has been changed by JonathanEllis. http://wiki.apache.org/cassandra/Operations?action=diff&rev1=7&rev2=8 -------------------------------------------------- Replication strategy may not be changed without wiping your data and starting over. - = Adding new nodes = + == Network topology == + + Besides datacenters, you can also tell Cassandra which nodes are in the same rack within a datacenter. Cassandra will use this to route both reads and data movement for Range changes to the nearest replicas. This is configured by a user-pluggable !EndpointSnitch class in the configuration file. + + !EndpointSnitch is related to, but distinct from, replication strategy itself: !RackAwareStrategy needs a properly configured Snitch to places replicas correctly, but even absent a Strategy that cares about datacenters, the rest of Cassandra will still be location-sensitive. + + There is an example of a custom Snitch implementation in https://svn.apache.org/repos/asf/incubator/cassandra/trunk/contrib/property_snitch/. + + = Range changes = + + == Bootstrap == Adding new nodes is called "bootstrapping." To bootstrap a node, turn !AutoBootstrap on in the configuration file, and start it. - If you explicitly specify an !InitialToken in the configuration, the new node will bootstrap to that position on the ring. Otherwise, it will pick a Token that will give it half the keys from the node with the most disk space used, that does not already have another node boostrapping into its Range. + If you explicitly specify an !InitialToken in the configuration, the new node will bootstrap to that position on the ring. Otherwise, it will pick a Token that will give it half the keys from the node with the most disk space used, that does not already have another node bootstrapping into its Range. Important things to note: @@ -39, +49 @@ 1. Automatically picking a Token only allows doubling your cluster size at once; for more than that, let the first group finish before starting another. 1. As a safety measure, Cassandra does not automatically remove data from nodes that "lose" part of their Token Range to a newly added node. Run "nodeprobe cleanup" on the source node(s) when you are satisfied the new node is up and working. If you do not do this the old data will still be counted against the load on that node and future bootstrap attempts at choosing a location will be thrown off. + Cassandra is smart enough to transfer data from the nearest source node(s), if your !EndpointSnitch is configured correctly. So, the new node doesn't need to be in the same datacenter as the primary replica for the Range it is bootstrapping into, as long as another replica is in the datacenter with the new one. + - = Removing nodes entirely = + == Removing nodes entirely == You can take a node out of the cluster with `nodeprobe decommission.` The node must be live at decommission time (until CASSANDRA-564 is done). Again, no data is removed automatically, so if you want to put the node back into service and you don't need the data on it anymore, it should be removed manually. - = Moving nodes = + == Moving nodes == - Moving is essentially a convenience over decommission + bootstrap. + `nodeprobe move`: move the target node to to a given Token. Moving is essentially a convenience over decommission + bootstrap. == Load balancing == - Also essentially a convenience over decommission + bootstrap, only instead of telling the node where to move on the ring it will choose its location based on the same heuristic as Token selection on bootstrap. + `nodeprobe loadbalance`: also essentially a convenience over decommission + bootstrap, only instead of telling the target node where to move on the ring it will choose its location based on the same heuristic as Token selection on bootstrap. = Consistency = Cassandra allows clients to specify the desired consistency level on reads and writes. (See [[API]].) If R + W > N, where R, W, and N are respectively the read replica count, the write replica count, and the replication factor, all client reads will see the most recent write. Otherwise, readers '''may''' see older versions, for periods of typically a few ms; this is called "eventual consistency." See http://www.allthingsdistributed.com/2008/12/eventually_consistent.html and http://queue.acm.org/detail.cfm?id=1466448 for more.
