So I guess the problem may have been with the initial addition of the 10.128.0.20 node because when I added it in it never synced data I guess? It was at around 50 MB when it first came up and transitioned to "UN". After it was in I did the 1->2 replication change and tried repair but it didn't fix it. From what I can tell all the data on it is stuff that has been written since it came up. We never delete data ever so we should have zero tombstones.
If I am not mistaken, only two of my nodes actually have all the data, 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13 is almost a GB lower and then of course 10.128.0.20 which is missing over 5 GB of data. I tried running nodetool -local on both DCs and it didn't fix either one. Am I running into a bug of some kind? On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com> wrote: > Hi Luke, > > You mentioned that replication factor was increased from 1 to 2. In that > case was the node bearing ip 10.128.0.20 carried around 3GB data earlier? > > You can run nodetool repair with option -local to initiate repair local > datacenter for gce-us-central1. > > Also you may suspect that if a lot of data was deleted while the node was > down it may be having a lot of tombstones which is not needed to be > replicated to the other node. In order to verify the same, you can issue a > select count(*) query on column families (With the amount of data you have > it should not be an issue) with tracing on and with consistency local_all > by connecting to either 10.128.0.3 or 10.128.0.20 and store it in a > file. It will give you a fair amount of idea about how many deleted cells > the nodes have. I tried searching for reference if tombstones are moved > around during repair, but I didnt find evidence of it. However I see no > reason to because if the node didnt have data then streaming tombstones > does not make a lot of sense. > > Regards, > Bhuvan > > On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com> wrote: > >> Here's my setup: >> >> Datacenter: gce-us-central1 >> =========================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN 10.128.0.3 6.4 GB 256 100.0% >> 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default >> UN 10.128.0.20 943.08 MB 256 100.0% >> 958348cb-8205-4630-8b96-0951bf33f3d3 default >> Datacenter: gce-us-east1 >> ======================== >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> UN 10.142.0.14 6.4 GB 256 100.0% >> c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default >> UN 10.142.0.13 5.55 GB 256 100.0% >> d0d9c30e-1506-4b95-be64-3dd4d78f0583 default >> >> And my replication settings are: >> >> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', >> 'gce-us-central1': '2', 'gce-us-east1': '2'} >> >> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of >> 943 MB even though it's supposed to own 100% and should have 6.4 GB. Also >> 10.142.0.13 >> seems also not to have everything as it only has a load of 5.55 GB. >> >> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com> >> wrote: >> >>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in >>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set >>> up is? >>> >>> On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote: >>> >>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and >>>> gce-us-east1. I increased the replication factor of gce-us-central1 from 1 >>>> to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The "Owns" >>>> for the node switched to 100% as it should but the Load showed that it >>>> didn't actually sync the data. I then ran a full 'nodetool repair' and it >>>> didn't fix it still. This scares me as I thought 'nodetool repair' was a >>>> way to assure consistency and that all the nodes were synced but it doesn't >>>> seem to be. Outside of that command, I have no idea how I would assure all >>>> the data was synced or how to get the data correctly synced without >>>> decommissioning the node and re-adding it. >>>> >>> >>> >>> >>> -- >>> Kurt Greaves >>> k...@instaclustr.com >>> www.instaclustr.com >>> >> >> >