What is the correct procedure for data re-partitioning? Suppose I have 3 nodes - "A", "B", "C" tokens on the ring: A: 0 B: 2.8356863910078205288614550619314e+37 C: 5.6713727820156410577229101238628e+37
Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2) Start node "D" with -b Wait Run nodeprobe -host hostB ... cleanup on live "B" Wait Done Now data is not evenly balanced because tokens are not evenly spaced. I see that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater) What happens with keys and data if I run it on "A", "B", "C" and "D" with new, better spaced tokens? Should I? is there a better procedure? On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <[email protected]> wrote: > On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <[email protected]> wrote: > > Hi, > > > > Question#1: > > How to manually select tokens to force equal spacing of tokens around the > > hash space? > > (Answered by Jun.) > > > Question#2: > > Let's assume that #1 was resolved somehow and key distribution is more or > > less even. > > A new node "C" joins the cluster. It's token falls somewhere between two > > other tokens on the ring (from nodes "A" and "B" clockwise-ordered). From > > now on "C" is responsible for a portion of data that used to exclusively > > belong to "B". > > a. Cassandra v.0.4 will not automatically transfer this data to "C" will > it? > > It will, if you start C with the -b ("bootstrap") flag. > > > b. Do all reads to these keys fail? > > No. > > > c. What happens with the data reference by these keys on "B"? It will > never > > be accessed there, therefor it becomes garbage. Since there are to GC > will > > it stick forever? > > nodeprobe cleanup after the bootstrap completes will instruct B to > throw out data that has been copied to C. > > > d. What happens to replicas of these keys? > > These are also handled by -b. > > -Jonathan >
