Re: distributing tokens equally along the key distribution space

Igor Katkov Thu, 01 Oct 2009 10:36:39 -0700

OK, so I don't need to use tokenupdater, what are the steps to rebalance
data around the circle?


In my test example (see below), I have A, D, B and C (clockwise) where
A holds 1/3 of the data
D - 1/6
B - 1/6
C - 1/3
I'm willing to change tokens manually, it's all right.
How do I tell all nodes to move data around in version 0.4? Do I change
token on node A and restart it with -b? Then same for the rest? restarting
only one node at a time?



On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <[email protected]> wrote:

> tokenupdater does not move data around; it's just an alternative to
> setting <initialtoken> on each node.  so you really want to get your
> tokens right for your initial set of nodes before adding data.
>
> we're finishing up full load balancing for 0.5 but even then it's best
> to start with a reasonable distribution instead of starting with
> random and forcing the balancer to move things around a bunch.
>
> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <[email protected]> wrote:
> > What is the correct procedure for data re-partitioning?
> > Suppose I have 3 nodes - "A", "B", "C"
> > tokens on the ring:
> > A: 0
> > B: 2.8356863910078205288614550619314e+37
> > C: 5.6713727820156410577229101238628e+37
> >
> > Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
> > Start node "D" with -b
> > Wait
> > Run nodeprobe -host hostB ... cleanup on live "B"
> > Wait
> > Done
> >
> > Now data is not evenly balanced because tokens are not evenly spaced. I
> see
> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
> > What happens with keys and data if I run it on "A", "B", "C" and "D" with
> > new, better spaced tokens? Should I? is there a better procedure?
> >
> >
> >
> >
> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <[email protected]>
> wrote:
> >>
> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > Question#1:
> >> > How to manually select tokens to force equal spacing of tokens around
> >> > the
> >> > hash space?
> >>
> >> (Answered by Jun.)
> >>
> >> > Question#2:
> >> > Let's assume that #1 was resolved somehow and key distribution is more
> >> > or
> >> > less even.
> >> > A new node "C" joins the cluster. It's token falls somewhere between
> two
> >> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered).
> >> > From
> >> > now on "C" is responsible for a portion of data that used to
> exclusively
> >> > belong to "B".
> >> > a. Cassandra v.0.4 will not automatically transfer this data to "C"
> will
> >> > it?
> >>
> >> It will, if you start C with the -b ("bootstrap") flag.
> >>
> >> > b. Do all reads to these keys fail?
> >>
> >> No.
> >>
> >> > c. What happens with the data reference by these keys on "B"? It will
> >> > never
> >> > be accessed there, therefor it becomes garbage. Since there are to GC
> >> > will
> >> > it stick forever?
> >>
> >> nodeprobe cleanup after the bootstrap completes will instruct B to
> >> throw out data that has been copied to C.
> >>
> >> > d. What happens to replicas of these keys?
> >>
> >> These are also handled by -b.
> >>
> >> -Jonathan
> >
> >
>

Re: distributing tokens equally along the key distribution space

Reply via email to