Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "Operations" page has been changed by JonathanEllis. http://wiki.apache.org/cassandra/Operations?action=diff&rev1=7&rev2=8 -------------------------------------------------- Note that with !RackAwareStrategy, succeeding nodes along the ring should alternate data centers to avoid hot spots. For instance, if you have nodes A, B, C, and D in increasing Token order, and instead of alternating you place A and B in DC1, and C and D in DC2, then nodes C and A will have disproportionately more data on them because they will be the replica destination for every Token range in the other data center. - Replication strategy may not be changed without wiping your data and starting over. + Replication strategy is not intended to be changed after loading data, but it can be done if you need to badly enough. The procedure would look something like: + 1. have each node do an anticompaction for its primary range + 1. manually scp those to the new replica points + 1. then switch the partitioner + + This could be done offline, or online at the cost of introducing some temporary inconsistency that could be fixed by repair (see below). = Adding new nodes = Adding new nodes is called "bootstrapping." @@ -65, +70 @@ 1. Remove the old node from the ring first, or bring up a replacement node with the same IP and Token as the old; otherwise, the old node will stay part of the ring in a "down" state, which will degrade your replication factor for the affected Range * If you don't know the Token of the old node, you can retrieve it from any of the other nodes' `system` keyspace, !ColumnFamily `LocationInfo`, key `L`. * You can also run `nodeprobe ring `to lookup a node's token (Unless there was some kind of outage, and the others came up but not the down one). - 1. Removing the old node, then bootstrapping the new one, may be more performant than using Anti-Entropy. Testing needed. + 1. Removing the old node, then bootstrapping the new one, may be more performant than using Anti-Entropy (testing needed), and will eliminate incorrect answers given by the replacement node while it does not yet have all the data for its Range. - * Even brute-force rsyncing of data from the relevant replicas and running cleanup on the replacement node may be more performant + * To test: even brute-force rsyncing of data from the relevant replicas and running cleanup on the replacement node may be more performant. = Backing up data = Cassandra can snapshot data while online using `nodeprobe snapshot`. You can then back up those snapshots using any desired system, although leaving them where they are is probably the option that makes the most sense on large clusters.
