RE: bug in bootstraping??

Michael Lee Sun, 03 Jan 2010 21:30:47 -0800

    I perform test again. The test environment is similar as before

    Address       Status     Load          Range                                
      Ring
                                           
170141183460469231731687303715884105728   
    10.237.4.85   Up         378.41 MB     
21267647932558653966460912964485513216     |<--|
    10.237.1.135  Up         377.04 MB     
42535295865117307932921825928971026432     |   ^
    10.237.1.137  Up         378.21 MB     
63802943797675961899382738893456539648     v   |
    10.237.1.139  Up         372.93 MB     
85070591730234615865843651857942052864     |   ^
    10.237.1.140  Up         371.95 MB     
106338239662793269832304564822427566080    v   |
    10.237.1.141  Up         366.18 MB     
127605887595351923798765477786913079296    |   ^
    10.237.1.143  Up         364.12 MB     
148873535527910577765226390751398592512    v   |
    10.237.1.144  Up         370.39 MB     
170141183460469231731687303715884105728    |-->|
   
    Perform following test:
    1. Kill service on 10.237.1.135, cleanup all data on that node(remove the 
whole data directory, not just a single table)
    2. Wait some time until the other nodes found 10.237.1.135 had been down
    3. restart all service except 10.237.1.135(10.237.1.135 keeping down) <-- 
THIS IS THE DIFFERENCE BETWEEN MY PREVIOUS TEST
    4. re-configure 10.237.1.135:
       ....
           <AutoBootstrap>true</AutoBootstrap>
       ....
           <InitialToken>42535295865117307932921825928971026432</InitialToken>
       ....
    5. start service on 10.237.1.135
    6. wait some time and check the system.log of 10.237.1.135, found it indeed 
do bootstrap, but there is no data transfered.


In step 3, after restart all service( except 10.237.1.135 ), the cluster should 
has no information about existence of 10.237.1.135,
and if 10.237.1.135 restart at step 5, if should do bootstrap and pull data 
from other node, but it's not work as expect.

---------END----------

-----Original Message-----
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Monday, January 04, 2010 9:42 AM
To: cassandra-user@incubator.apache.org
Subject: Re: bug in bootstraping??

... what should also work is bootstrapping the new node in (with a
different IP) FIRST, in between the old node's token and it's
successor's.   Then part of the range's data will be transferred on
bootstrap, and the rest when you decommission the old one afterwards.

-Jonathan

On Sun, Jan 3, 2010 at 7:20 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
> This is working as designed; to use the bootstrap approach you must
> removetoken the old entry first.  This is not necessary with the
> "nodeprobe repair" approach to recovery.  I will edit the wiki to make
> this more clear.
>
> On Sun, Jan 3, 2010 at 3:11 AM, Michael Lee
> <mail.list.steel.men...@gmail.com> wrote:
>> HI,guys:
>>
>>
>>
>> If one node of a cluster going down and damage it’s data, it can restore
>> data by bootstrapping, theoretically (wiki link Operations)
>>
>>
>>
>> But sometimes it will lost some or all of it’s original data.
>>
>>
>>
>> Suppose an 8 nodes cluster, who has 10000 rows, each about 100k,
>> ReplicationFactor is 3:
>>
>> Address       Status     Load
>> Range                                      Ring
>>
>>
>>                                   170141183460469231731687303715884105728
>>
>> 10.237.4.85   Up         378.41 MB
>> 21267647932558653966460912964485513216     |<--|
>>
>> 10.237.1.135  Up         377.04 MB
>> 42535295865117307932921825928971026432     |   ^
>>
>> 10.237.1.137  Up         378.21 MB
>> 63802943797675961899382738893456539648     v   |
>>
>> 10.237.1.139  Up         372.93 MB
>> 85070591730234615865843651857942052864     |   ^
>>
>> 10.237.1.140  Up         371.95 MB
>> 106338239662793269832304564822427566080    v   |
>>
>> 10.237.1.141  Up         366.18 MB
>> 127605887595351923798765477786913079296    |   ^
>>
>> 10.237.1.143  Up         364.12 MB
>> 148873535527910577765226390751398592512    v   |
>>
>> 10.237.1.144  Up         370.39 MB
>> 170141183460469231731687303715884105728    |-->|
>>
>>
>>
>> Perform following test:
>>
>> 1．  Kill service on 10.237.1.135, cleanup all data on that node(remove whole
>> data directory, not just a single table).
>>
>> 2．  Re-configure 10.237.1.135:
>>
>> ….
>>
>> <InitialToken>42535295865117307932921825928971026432</InitialToken>
>>
>> ….
>>
>> <AutoBootstrap>true</AutoBootstrap>
>>
>> 3．  Start service on 10.237.1.135
>>
>> 4．  Wait a very long time, check what happens:
>>
>> Address       Status     Load
>> Range                                      Ring
>>
>>
>>                                   170141183460469231731687303715884105728
>>
>> 10.237.4.85   Up         378.41 MB
>> 21267647932558653966460912964485513216     |<--|   /// it’s seed, my cluster
>> only have one seed
>>
>> 10.237.1.135  Up         0 bytes
>>     42535295865117307932921825928971026432     |   ^  /// lost all data
>>
>> 10.237.1.137  Up         378.21 MB
>> 63802943797675961899382738893456539648     v   |
>>
>> 10.237.1.139  Up         372.93 MB
>> 85070591730234615865843651857942052864     |   ^
>>
>> 10.237.1.140  Up         371.95 MB
>> 106338239662793269832304564822427566080    v   |
>>
>> 10.237.1.141  Up         366.18 MB
>> 127605887595351923798765477786913079296    |   ^
>>
>> 10.237.1.143  Up         364.12 MB
>> 148873535527910577765226390751398592512    v   |
>>
>> 10.237.1.144  Up         370.39 MB
>> 170141183460469231731687303715884105728    |-->|
>>
>>
>>
>> Check system.log of 10.237.1.135, we can find 10.237.1.135 indeed do some
>> bootstrapping.
>>
>>
>>
>> If use other node except 10.237.1.135 (and 10.237.4.85 of course, it’s seed,
>> and seed cannot bootstrap)
>>
>> to repeat above test, some node can restore about 120~200MB data by
>> bootstrapping, some node restore nothing.
>>
>>
>>
>> I know ‘removetoken’ can fix replica, but if removetoken first, and bring
>> the node back, some data will be move twice, that’s a waste of network
>> bandwidth.
>>
>>
>>
>> So, the question is:  Is this “random bootstrapping” behavior bug, or
>> designed to ?
>>
>>
>>
>> ---------END----------
>>
>>
>

RE: bug in bootstraping??

Reply via email to