Re: Increasing replication factor and repair doesn't seem to work

Luke Jolly Wed, 25 May 2016 12:57:11 -0700

After thinking about it more, I have no idea how that worked at all.  I
must have not cleared out the working directory or something....
Regardless, I did something weird  with my initial joining of the cluster
and then wasn't using repair -full.  Thank y'all very much for the info.


On Wed, May 25, 2016 at 3:11 PM Luke Jolly <l...@getadmiral.com> wrote:

> So I figured out the main cause of the problem.  The seed node was
> itself.  That's what got it in a weird state.  The second part was that I
> didn't know the default repair is incremental as I was accidently looking
> at the wrong version documentation.  After running a repair -full, the 3
> other nodes are synced correctly it seems as they have identical loads.
> Strangely, now the problem 10.128.0.20 node has 10 GB of load (the others
> have 6 GB).  Since I now know I started it off in a very weird state, I'm
> going to just decommission it and add it back in from scratch.  When I
> added it, all working folders were cleared.
>
> I feel Cassandra should through an error if the seed node is set to itself
> and fail to bootstrap / join?
>
>
> On Wed, May 25, 2016 at 2:37 AM Mike Yeap <wkk1...@gmail.com> wrote:
>
>> Hi Luke, I've encountered similar problem before, could you please advise
>> on following?
>>
>> 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml?
>>
>> 2) when you add 10.128.0.20, were the data and cache directories in
>> 10.128.0.20 empty?
>>
>>    - /var/lib/cassandra/data
>>    - /var/lib/cassandra/saved_caches
>>
>> 3) if you do a compact in 10.128.0.3, what is the size shown in "Load"
>> column in "nodetool status <keyspace_name>"?
>>
>> 4) when you do the full repair, did you use "nodetool repair" or
>> "nodetool repair -full"? I'm asking this because Incremental Repair is the
>> default for Cassandra 2.2 and later.
>>
>>
>> Regards,
>> Mike Yeap
>>
>> On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng <br...@blockcypher.com>
>> wrote:
>>
>>> Hi Luke,
>>>
>>> I've never found nodetool status' load to be useful beyond a general
>>> indicator.
>>>
>>> You should expect some small skew, as this will depend on your current
>>> compaction status, tombstones, etc. IIRC repair will not provide
>>> consistency of intermediate states nor will it remove tombstones, it only
>>> guarantees consistency in the final state. This means, in the case of
>>> dropped hints or mutations, you will see differences in intermediate
>>> states, and therefore storage footrpint, even in fully repaired nodes. This
>>> includes intermediate UPDATE operations as well.
>>>
>>> Your one node with sub 1GB sticks out like a sore thumb, though. Where
>>> did you originate the nodetool repair from? Remember that repair will only
>>> ensure consistency for ranges held by the node you're running it on. While
>>> I am not sure if missing ranges are included in this, if you ran nodetool
>>> repair only on a machine with partial ownership, you will need to complete
>>> repairs across the ring before data will return to full consistency.
>>>
>>> I would query some older data using consistency = ONE on the affected
>>> machine to determine if you are actually missing data.  There are a few
>>> outstanding bugs in the 2.1.x  and older release families that may result
>>> in tombstone creation even without deletes, for example CASSANDRA-10547,
>>> which impacts updates on collections in pre-2.1.13 Cassandra.
>>>
>>> You can also try examining the output of nodetool ring, which will give
>>> you a breakdown of tokens and their associations within your cluster.
>>>
>>> --Bryan
>>>
>>> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <k...@instaclustr.com>
>>> wrote:
>>>
>>>> Not necessarily considering RF is 2 so both nodes should have all
>>>> partitions. Luke, are you sure the repair is succeeding? You don't have
>>>> other keyspaces/duplicate data/extra data in your cassandra data directory?
>>>> Also, you could try querying on the node with less data to confirm if
>>>> it has the same dataset.
>>>>
>>>> On 24 May 2016 at 22:03, Bhuvan Rawal <bhu1ra...@gmail.com> wrote:
>>>>
>>>>> For the other DC, it can be acceptable because partition reside on one
>>>>> node, so say  if you have a large partition, it may skew things a bit.
>>>>> On May 25, 2016 2:41 AM, "Luke Jolly" <l...@getadmiral.com> wrote:
>>>>>
>>>>>> So I guess the problem may have been with the initial addition of the
>>>>>> 10.128.0.20 node because when I added it in it never synced data I
>>>>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>>>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>>>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>>>>> that has been written since it came up.  We never delete data ever so we
>>>>>> should have zero tombstones.
>>>>>>
>>>>>> If I am not mistaken, only two of my nodes actually have all the
>>>>>> data, 10.128.0.3 and 10.142.0.14 since they agree on the data amount.
>>>>>> 10.142.0.13 is almost a GB lower and then of course 10.128.0.20
>>>>>> which is missing over 5 GB of data.  I tried running nodetool -local on
>>>>>> both DCs and it didn't fix either one.
>>>>>>
>>>>>> Am I running into a bug of some kind?
>>>>>>
>>>>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Luke,
>>>>>>>
>>>>>>> You mentioned that replication factor was increased from 1 to 2. In
>>>>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data
>>>>>>> earlier?
>>>>>>>
>>>>>>> You can run nodetool repair with option -local to initiate repair
>>>>>>> local datacenter for gce-us-central1.
>>>>>>>
>>>>>>> Also you may suspect that if a lot of data was deleted while the
>>>>>>> node was down it may be having a lot of tombstones which is not needed 
>>>>>>> to
>>>>>>> be replicated to the other node. In order to verify the same, you can 
>>>>>>> issue
>>>>>>> a select count(*) query on column families (With the amount of data you
>>>>>>> have it should not be an issue) with tracing on and with consistency
>>>>>>> local_all by connecting to either 10.128.0.3  or 10.128.0.20 and
>>>>>>> store it in a file. It will give you a fair amount of idea about how 
>>>>>>> many
>>>>>>> deleted cells the nodes have. I tried searching for reference if 
>>>>>>> tombstones
>>>>>>> are moved around during repair, but I didnt find evidence of it. 
>>>>>>> However I
>>>>>>> see no reason to because if the node didnt have data then streaming
>>>>>>> tombstones does not make a lot of sense.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Bhuvan
>>>>>>>
>>>>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here's my setup:
>>>>>>>>
>>>>>>>> Datacenter: gce-us-central1
>>>>>>>> ===========================
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>>                               Rack
>>>>>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>>>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>>>>>> Datacenter: gce-us-east1
>>>>>>>> ========================
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>>                               Rack
>>>>>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>>>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>>>>>
>>>>>>>> And my replication settings are:
>>>>>>>>
>>>>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>>>>>
>>>>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a
>>>>>>>> load of 943 MB even though it's supposed to own 100% and should have 
>>>>>>>> 6.4
>>>>>>>> GB.  Also 10.142.0.13 seems also not to have everything as it only
>>>>>>>> has a load of 5.55 GB.
>>>>>>>>
>>>>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1
>>>>>>>>> node in each DC then a RF of 2 doesn't make sense. Can you clarify on 
>>>>>>>>> what
>>>>>>>>> your set up is?
>>>>>>>>>
>>>>>>>>> On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote:
>>>>>>>>>
>>>>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 
>>>>>>>>>> from 1
>>>>>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The
>>>>>>>>>> "Owns" for the node switched to 100% as it should but the Load 
>>>>>>>>>> showed that
>>>>>>>>>> it didn't actually sync the data.  I then ran a full 'nodetool 
>>>>>>>>>> repair' and
>>>>>>>>>> it didn't fix it still.  This scares me as I thought 'nodetool 
>>>>>>>>>> repair' was
>>>>>>>>>> a way to assure consistency and that all the nodes were synced but it
>>>>>>>>>> doesn't seem to be.  Outside of that command, I have no idea how I 
>>>>>>>>>> would
>>>>>>>>>> assure all the data was synced or how to get the data correctly 
>>>>>>>>>> synced
>>>>>>>>>> without decommissioning the node and re-adding it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Kurt Greaves
>>>>>>>>> k...@instaclustr.com
>>>>>>>>> www.instaclustr.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>>
>>>> --
>>>> Kurt Greaves
>>>> k...@instaclustr.com
>>>> www.instaclustr.com
>>>>
>>>
>>>
>>

Re: Increasing replication factor and repair doesn't seem to work

Reply via email to