Re: Increasing replication factor and repair doesn't seem to work

Luke Jolly Wed, 25 May 2016 12:26:53 -0700

So I figured out the main cause of the problem.  The seed node was itself.
That's what got it in a weird state.  The second part was that I didn't
know the default repair is incremental as I was accidently looking at the
wrong version documentation.  After running a repair -full, the 3 other
nodes are synced correctly it seems as they have identical loads.
Strangely, now the problem 10.128.0.20 node has 10 GB of load (the others
have 6 GB).  Since I now know I started it off in a very weird state, I'm
going to just decommission it and add it back in from scratch.  When I
added it, all working folders were cleared.


I feel Cassandra should through an error if the seed node is set to itself
and fail to bootstrap / join?

On Wed, May 25, 2016 at 2:37 AM Mike Yeap <wkk1...@gmail.com> wrote:

> Hi Luke, I've encountered similar problem before, could you please advise
> on following?
>
> 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml?
>
> 2) when you add 10.128.0.20, were the data and cache directories in
> 10.128.0.20 empty?
>
>    - /var/lib/cassandra/data
>    - /var/lib/cassandra/saved_caches
>
> 3) if you do a compact in 10.128.0.3, what is the size shown in "Load"
> column in "nodetool status <keyspace_name>"?
>
> 4) when you do the full repair, did you use "nodetool repair" or "nodetool
> repair -full"? I'm asking this because Incremental Repair is the default
> for Cassandra 2.2 and later.
>
>
> Regards,
> Mike Yeap
>
> On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng <br...@blockcypher.com>
> wrote:
>
>> Hi Luke,
>>
>> I've never found nodetool status' load to be useful beyond a general
>> indicator.
>>
>> You should expect some small skew, as this will depend on your current
>> compaction status, tombstones, etc. IIRC repair will not provide
>> consistency of intermediate states nor will it remove tombstones, it only
>> guarantees consistency in the final state. This means, in the case of
>> dropped hints or mutations, you will see differences in intermediate
>> states, and therefore storage footrpint, even in fully repaired nodes. This
>> includes intermediate UPDATE operations as well.
>>
>> Your one node with sub 1GB sticks out like a sore thumb, though. Where
>> did you originate the nodetool repair from? Remember that repair will only
>> ensure consistency for ranges held by the node you're running it on. While
>> I am not sure if missing ranges are included in this, if you ran nodetool
>> repair only on a machine with partial ownership, you will need to complete
>> repairs across the ring before data will return to full consistency.
>>
>> I would query some older data using consistency = ONE on the affected
>> machine to determine if you are actually missing data.  There are a few
>> outstanding bugs in the 2.1.x  and older release families that may result
>> in tombstone creation even without deletes, for example CASSANDRA-10547,
>> which impacts updates on collections in pre-2.1.13 Cassandra.
>>
>> You can also try examining the output of nodetool ring, which will give
>> you a breakdown of tokens and their associations within your cluster.
>>
>> --Bryan
>>
>> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <k...@instaclustr.com>
>> wrote:
>>
>>> Not necessarily considering RF is 2 so both nodes should have all
>>> partitions. Luke, are you sure the repair is succeeding? You don't have
>>> other keyspaces/duplicate data/extra data in your cassandra data directory?
>>> Also, you could try querying on the node with less data to confirm if it
>>> has the same dataset.
>>>
>>> On 24 May 2016 at 22:03, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>>
>>>> For the other DC, it can be acceptable because partition reside on one
>>>> node, so say  if you have a large partition, it may skew things a bit.
>>>> On May 25, 2016 2:41 AM, "Luke Jolly" <l...@getadmiral.com> wrote:
>>>>
>>>>> So I guess the problem may have been with the initial addition of the
>>>>> 10.128.0.20 node because when I added it in it never synced data I
>>>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>>>> that has been written since it came up.  We never delete data ever so we
>>>>> should have zero tombstones.
>>>>>
>>>>> If I am not mistaken, only two of my nodes actually have all the data,
>>>>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 
>>>>> 10.142.0.13
>>>>> is almost a GB lower and then of course 10.128.0.20 which is missing
>>>>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>>>>> didn't fix either one.
>>>>>
>>>>> Am I running into a bug of some kind?
>>>>>
>>>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Luke,
>>>>>>
>>>>>> You mentioned that replication factor was increased from 1 to 2. In
>>>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data
>>>>>> earlier?
>>>>>>
>>>>>> You can run nodetool repair with option -local to initiate repair
>>>>>> local datacenter for gce-us-central1.
>>>>>>
>>>>>> Also you may suspect that if a lot of data was deleted while the node
>>>>>> was down it may be having a lot of tombstones which is not needed to be
>>>>>> replicated to the other node. In order to verify the same, you can issue 
>>>>>> a
>>>>>> select count(*) query on column families (With the amount of data you 
>>>>>> have
>>>>>> it should not be an issue) with tracing on and with consistency local_all
>>>>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>>>>> file. It will give you a fair amount of idea about how many deleted cells
>>>>>> the nodes have. I tried searching for reference if tombstones are moved
>>>>>> around during repair, but I didnt find evidence of it. However I see no
>>>>>> reason to because if the node didnt have data then streaming tombstones
>>>>>> does not make a lot of sense.
>>>>>>
>>>>>> Regards,
>>>>>> Bhuvan
>>>>>>
>>>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here's my setup:
>>>>>>>
>>>>>>> Datacenter: gce-us-central1
>>>>>>> ===========================
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>                             Rack
>>>>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>>>>> Datacenter: gce-us-east1
>>>>>>> ========================
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>                             Rack
>>>>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>>>>
>>>>>>> And my replication settings are:
>>>>>>>
>>>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>>>>
>>>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a
>>>>>>> load of 943 MB even though it's supposed to own 100% and should have 6.4
>>>>>>> GB.  Also 10.142.0.13 seems also not to have everything as it only
>>>>>>> has a load of 5.55 GB.
>>>>>>>
>>>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1
>>>>>>>> node in each DC then a RF of 2 doesn't make sense. Can you clarify on 
>>>>>>>> what
>>>>>>>> your set up is?
>>>>>>>>
>>>>>>>> On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote:
>>>>>>>>
>>>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 
>>>>>>>>> from 1
>>>>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The
>>>>>>>>> "Owns" for the node switched to 100% as it should but the Load showed 
>>>>>>>>> that
>>>>>>>>> it didn't actually sync the data.  I then ran a full 'nodetool 
>>>>>>>>> repair' and
>>>>>>>>> it didn't fix it still.  This scares me as I thought 'nodetool 
>>>>>>>>> repair' was
>>>>>>>>> a way to assure consistency and that all the nodes were synced but it
>>>>>>>>> doesn't seem to be.  Outside of that command, I have no idea how I 
>>>>>>>>> would
>>>>>>>>> assure all the data was synced or how to get the data correctly synced
>>>>>>>>> without decommissioning the node and re-adding it.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Kurt Greaves
>>>>>>>> k...@instaclustr.com
>>>>>>>> www.instaclustr.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> k...@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>>
>

Re: Increasing replication factor and repair doesn't seem to work

Reply via email to