Re: Would we have data corruption if we bootstrapped 10 nodes at once?

Branton Davis Mon, 19 Oct 2015 09:21:20 -0700

Is that also true if you're standing up multiple nodes from backups that
already have data?  Could you not stand up more than one at a time since
they already have the data?


On Mon, Oct 19, 2015 at 10:48 AM, Eric Stevens <migh...@gmail.com> wrote:

> It seems to me that as long as cleanup hasn't happened, if you
> *decommission* the newly joined nodes, they'll stream whatever writes
> they took back to the original replicas.  Presumably that should be pretty
> quick as they won't have nearly as much data as the original nodes (as they
> only hold data written while they were online).  Then as long as cleanup
> hasn't happened, your cluster should have returned to a consistent view of
> the data.  You can now bootstrap the new nodes again.
>
> If you have done a cleanup, then the data is probably irreversibly
> corrupted, you will have to figure out how to restore the missing data
> incrementally from backups if they are available.
>
> On Sun, Oct 18, 2015 at 10:37 PM Raj Chudasama <raj.chudas...@gmail.com>
> wrote:
>
>> In this can does it make sense to remove newly added nodes, correct the
>> configuration and have them rejoin one at a time ?
>>
>> Thx
>>
>> Sent from my iPhone
>>
>> On Oct 18, 2015, at 11:19 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>> Take a snapshot now, before you get rid of any data (whatever you do,
>> don’t run cleanup).
>>
>> If you identify missing data, you can go back to those snapshots, find
>> the nodes that had the data previously (sstable2json, for example), and
>> either re-stream that data into the cluster with sstableloader or copy it
>> to a new host and `nodetool refresh` it into the new system.
>>
>>
>>
>> From: <burtonator2...@gmail.com> on behalf of Kevin Burton
>> Reply-To: "user@cassandra.apache.org"
>> Date: Sunday, October 18, 2015 at 8:10 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>> at once?
>>
>> ouch.. OK.. I think I really shot myself in the foot here then.  This
>> might be bad.
>>
>> I'm not sure if I would have missing data.  I mean basically the data is
>> on the other nodes.. but the cluster has been running with 10 nodes
>> accidentally bootstrapped with auto_bootstrap=false.
>>
>> So they have new data and seem to be missing values.
>>
>> this is somewhat misleading... Initially if you start it up and run
>> nodetool status , it only returns one node.
>>
>> So I assumed auto_bootstrap=false meant that it just doesn't join the
>> cluster.
>>
>> I'm running a nodetool repair now to hopefully fix this.
>>
>>
>>
>> On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>>> auto_bootstrap=false tells it to join the cluster without running
>>> bootstrap – the node assumes it has all of the necessary data, and won’t
>>> stream any missing data.
>>>
>>> This generally violates consistency guarantees, but if done on a single
>>> node, is typically correctable with `nodetool repair`.
>>>
>>> If you do it on many  nodes at once, it’s possible that the new nodes
>>> could represent all 3 replicas of the data, but don’t physically have any
>>> of that data, leading to missing records.
>>>
>>>
>>>
>>> From: <burtonator2...@gmail.com> on behalf of Kevin Burton
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Sunday, October 18, 2015 at 3:44 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>>> at once?
>>>
>>> An shit.. I think we're seeing corruption.. missing records :-/
>>>
>>> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton <bur...@spinn3r.com>
>>> wrote:
>>>
>>>> We just migrated from a 30 node cluster to a 45 node cluster. (so 15
>>>> new nodes)
>>>>
>>>> By default we have auto_boostrap = false
>>>>
>>>> so we just push our config to the cluster, the cassandra daemons
>>>> restart, and they're not cluster members and are the only nodes in the
>>>> cluster.
>>>>
>>>> Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had
>>>> about 7 members of the cluster and 8 not yet joined.
>>>>
>>>> We are only doing 1 at a time because apparently bootstrapping more
>>>> than 1 is unsafe.
>>>>
>>>> I did a rolling restart whereby I went through and restarted all the
>>>> cassandra boxes.
>>>>
>>>> Somehow the new nodes auto boostrapped themselves EVEN though
>>>> auto_bootstrap=false.
>>>>
>>>> We don't have any errors.  Everything seems functional.  I'm just
>>>> worried about data loss.
>>>>
>>>> Thoughts?
>>>>
>>>> Kevin
>>>>
>>>> --
>>>>
>>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>>> Engineers!
>>>>
>>>> Founder/CEO Spinn3r.com
>>>> Location: *San Francisco, CA*
>>>> blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>>
>>>
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>
>>

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

Reply via email to