OK, so I don't need to use tokenupdater, what are the steps to rebalance data around the circle?
In my test example (see below), I have A, D, B and C (clockwise) where A holds 1/3 of the data D - 1/6 B - 1/6 C - 1/3 I'm willing to change tokens manually, it's all right. How do I tell all nodes to move data around in version 0.4? Do I change token on node A and restart it with -b? Then same for the rest? restarting only one node at a time? On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <[email protected]> wrote: > tokenupdater does not move data around; it's just an alternative to > setting <initialtoken> on each node. so you really want to get your > tokens right for your initial set of nodes before adding data. > > we're finishing up full load balancing for 0.5 but even then it's best > to start with a reasonable distribution instead of starting with > random and forcing the balancer to move things around a bunch. > > On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <[email protected]> wrote: > > What is the correct procedure for data re-partitioning? > > Suppose I have 3 nodes - "A", "B", "C" > > tokens on the ring: > > A: 0 > > B: 2.8356863910078205288614550619314e+37 > > C: 5.6713727820156410577229101238628e+37 > > > > Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2) > > Start node "D" with -b > > Wait > > Run nodeprobe -host hostB ... cleanup on live "B" > > Wait > > Done > > > > Now data is not evenly balanced because tokens are not evenly spaced. I > see > > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater) > > What happens with keys and data if I run it on "A", "B", "C" and "D" with > > new, better spaced tokens? Should I? is there a better procedure? > > > > > > > > > > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <[email protected]> > wrote: > >> > >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <[email protected]> wrote: > >> > Hi, > >> > > >> > Question#1: > >> > How to manually select tokens to force equal spacing of tokens around > >> > the > >> > hash space? > >> > >> (Answered by Jun.) > >> > >> > Question#2: > >> > Let's assume that #1 was resolved somehow and key distribution is more > >> > or > >> > less even. > >> > A new node "C" joins the cluster. It's token falls somewhere between > two > >> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered). > >> > From > >> > now on "C" is responsible for a portion of data that used to > exclusively > >> > belong to "B". > >> > a. Cassandra v.0.4 will not automatically transfer this data to "C" > will > >> > it? > >> > >> It will, if you start C with the -b ("bootstrap") flag. > >> > >> > b. Do all reads to these keys fail? > >> > >> No. > >> > >> > c. What happens with the data reference by these keys on "B"? It will > >> > never > >> > be accessed there, therefor it becomes garbage. Since there are to GC > >> > will > >> > it stick forever? > >> > >> nodeprobe cleanup after the bootstrap completes will instruct B to > >> throw out data that has been copied to C. > >> > >> > d. What happens to replicas of these keys? > >> > >> These are also handled by -b. > >> > >> -Jonathan > > > > >
