After thinking about it more, I have no idea how that worked at all. I must have not cleared out the working directory or something.... Regardless, I did something weird with my initial joining of the cluster and then wasn't using repair -full. Thank y'all very much for the info.
On Wed, May 25, 2016 at 3:11 PM Luke Jolly <l...@getadmiral.com> wrote: > So I figured out the main cause of the problem. The seed node was > itself. That's what got it in a weird state. The second part was that I > didn't know the default repair is incremental as I was accidently looking > at the wrong version documentation. After running a repair -full, the 3 > other nodes are synced correctly it seems as they have identical loads. > Strangely, now the problem 10.128.0.20 node has 10 GB of load (the others > have 6 GB). Since I now know I started it off in a very weird state, I'm > going to just decommission it and add it back in from scratch. When I > added it, all working folders were cleared. > > I feel Cassandra should through an error if the seed node is set to itself > and fail to bootstrap / join? > > > On Wed, May 25, 2016 at 2:37 AM Mike Yeap <wkk1...@gmail.com> wrote: > >> Hi Luke, I've encountered similar problem before, could you please advise >> on following? >> >> 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml? >> >> 2) when you add 10.128.0.20, were the data and cache directories in >> 10.128.0.20 empty? >> >> - /var/lib/cassandra/data >> - /var/lib/cassandra/saved_caches >> >> 3) if you do a compact in 10.128.0.3, what is the size shown in "Load" >> column in "nodetool status <keyspace_name>"? >> >> 4) when you do the full repair, did you use "nodetool repair" or >> "nodetool repair -full"? I'm asking this because Incremental Repair is the >> default for Cassandra 2.2 and later. >> >> >> Regards, >> Mike Yeap >> >> On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng <br...@blockcypher.com> >> wrote: >> >>> Hi Luke, >>> >>> I've never found nodetool status' load to be useful beyond a general >>> indicator. >>> >>> You should expect some small skew, as this will depend on your current >>> compaction status, tombstones, etc. IIRC repair will not provide >>> consistency of intermediate states nor will it remove tombstones, it only >>> guarantees consistency in the final state. This means, in the case of >>> dropped hints or mutations, you will see differences in intermediate >>> states, and therefore storage footrpint, even in fully repaired nodes. This >>> includes intermediate UPDATE operations as well. >>> >>> Your one node with sub 1GB sticks out like a sore thumb, though. Where >>> did you originate the nodetool repair from? Remember that repair will only >>> ensure consistency for ranges held by the node you're running it on. While >>> I am not sure if missing ranges are included in this, if you ran nodetool >>> repair only on a machine with partial ownership, you will need to complete >>> repairs across the ring before data will return to full consistency. >>> >>> I would query some older data using consistency = ONE on the affected >>> machine to determine if you are actually missing data. There are a few >>> outstanding bugs in the 2.1.x and older release families that may result >>> in tombstone creation even without deletes, for example CASSANDRA-10547, >>> which impacts updates on collections in pre-2.1.13 Cassandra. >>> >>> You can also try examining the output of nodetool ring, which will give >>> you a breakdown of tokens and their associations within your cluster. >>> >>> --Bryan >>> >>> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <k...@instaclustr.com> >>> wrote: >>> >>>> Not necessarily considering RF is 2 so both nodes should have all >>>> partitions. Luke, are you sure the repair is succeeding? You don't have >>>> other keyspaces/duplicate data/extra data in your cassandra data directory? >>>> Also, you could try querying on the node with less data to confirm if >>>> it has the same dataset. >>>> >>>> On 24 May 2016 at 22:03, Bhuvan Rawal <bhu1ra...@gmail.com> wrote: >>>> >>>>> For the other DC, it can be acceptable because partition reside on one >>>>> node, so say if you have a large partition, it may skew things a bit. >>>>> On May 25, 2016 2:41 AM, "Luke Jolly" <l...@getadmiral.com> wrote: >>>>> >>>>>> So I guess the problem may have been with the initial addition of the >>>>>> 10.128.0.20 node because when I added it in it never synced data I >>>>>> guess? It was at around 50 MB when it first came up and transitioned to >>>>>> "UN". After it was in I did the 1->2 replication change and tried repair >>>>>> but it didn't fix it. From what I can tell all the data on it is stuff >>>>>> that has been written since it came up. We never delete data ever so we >>>>>> should have zero tombstones. >>>>>> >>>>>> If I am not mistaken, only two of my nodes actually have all the >>>>>> data, 10.128.0.3 and 10.142.0.14 since they agree on the data amount. >>>>>> 10.142.0.13 is almost a GB lower and then of course 10.128.0.20 >>>>>> which is missing over 5 GB of data. I tried running nodetool -local on >>>>>> both DCs and it didn't fix either one. >>>>>> >>>>>> Am I running into a bug of some kind? >>>>>> >>>>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bhu1ra...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Luke, >>>>>>> >>>>>>> You mentioned that replication factor was increased from 1 to 2. In >>>>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data >>>>>>> earlier? >>>>>>> >>>>>>> You can run nodetool repair with option -local to initiate repair >>>>>>> local datacenter for gce-us-central1. >>>>>>> >>>>>>> Also you may suspect that if a lot of data was deleted while the >>>>>>> node was down it may be having a lot of tombstones which is not needed >>>>>>> to >>>>>>> be replicated to the other node. In order to verify the same, you can >>>>>>> issue >>>>>>> a select count(*) query on column families (With the amount of data you >>>>>>> have it should not be an issue) with tracing on and with consistency >>>>>>> local_all by connecting to either 10.128.0.3 or 10.128.0.20 and >>>>>>> store it in a file. It will give you a fair amount of idea about how >>>>>>> many >>>>>>> deleted cells the nodes have. I tried searching for reference if >>>>>>> tombstones >>>>>>> are moved around during repair, but I didnt find evidence of it. >>>>>>> However I >>>>>>> see no reason to because if the node didnt have data then streaming >>>>>>> tombstones does not make a lot of sense. >>>>>>> >>>>>>> Regards, >>>>>>> Bhuvan >>>>>>> >>>>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <l...@getadmiral.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Here's my setup: >>>>>>>> >>>>>>>> Datacenter: gce-us-central1 >>>>>>>> =========================== >>>>>>>> Status=Up/Down >>>>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>>>> -- Address Load Tokens Owns (effective) Host ID >>>>>>>> Rack >>>>>>>> UN 10.128.0.3 6.4 GB 256 100.0% >>>>>>>> 3317a3de-9113-48e2-9a85-bbf756d7a4a6 default >>>>>>>> UN 10.128.0.20 943.08 MB 256 100.0% >>>>>>>> 958348cb-8205-4630-8b96-0951bf33f3d3 default >>>>>>>> Datacenter: gce-us-east1 >>>>>>>> ======================== >>>>>>>> Status=Up/Down >>>>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>>>> -- Address Load Tokens Owns (effective) Host ID >>>>>>>> Rack >>>>>>>> UN 10.142.0.14 6.4 GB 256 100.0% >>>>>>>> c3a5c39d-e1c9-4116-903d-b6d1b23fb652 default >>>>>>>> UN 10.142.0.13 5.55 GB 256 100.0% >>>>>>>> d0d9c30e-1506-4b95-be64-3dd4d78f0583 default >>>>>>>> >>>>>>>> And my replication settings are: >>>>>>>> >>>>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', >>>>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'} >>>>>>>> >>>>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a >>>>>>>> load of 943 MB even though it's supposed to own 100% and should have >>>>>>>> 6.4 >>>>>>>> GB. Also 10.142.0.13 seems also not to have everything as it only >>>>>>>> has a load of 5.55 GB. >>>>>>>> >>>>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <k...@instaclustr.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1 >>>>>>>>> node in each DC then a RF of 2 doesn't make sense. Can you clarify on >>>>>>>>> what >>>>>>>>> your set up is? >>>>>>>>> >>>>>>>>> On 23 May 2016 at 19:31, Luke Jolly <l...@getadmiral.com> wrote: >>>>>>>>> >>>>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and >>>>>>>>>> gce-us-east1. I increased the replication factor of gce-us-central1 >>>>>>>>>> from 1 >>>>>>>>>> to 2. Then I ran 'nodetool repair -dc gce-us-central1'. The >>>>>>>>>> "Owns" for the node switched to 100% as it should but the Load >>>>>>>>>> showed that >>>>>>>>>> it didn't actually sync the data. I then ran a full 'nodetool >>>>>>>>>> repair' and >>>>>>>>>> it didn't fix it still. This scares me as I thought 'nodetool >>>>>>>>>> repair' was >>>>>>>>>> a way to assure consistency and that all the nodes were synced but it >>>>>>>>>> doesn't seem to be. Outside of that command, I have no idea how I >>>>>>>>>> would >>>>>>>>>> assure all the data was synced or how to get the data correctly >>>>>>>>>> synced >>>>>>>>>> without decommissioning the node and re-adding it. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Kurt Greaves >>>>>>>>> k...@instaclustr.com >>>>>>>>> www.instaclustr.com >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>> >>>> >>>> -- >>>> Kurt Greaves >>>> k...@instaclustr.com >>>> www.instaclustr.com >>>> >>> >>> >>