Re: Nodetool repair with Load times 5
Dear Alain, Thanks again for your precious help. I might help, but I need to know what you have done recently (change the RF, Add remove node, cleanups, anything else as much as possible...) I have a cluster of 5 nodes all running Cassandra 2.1.8. I have a fixed schema which never changes. I have not changed RF, it is 3. I have not remove nodes, no cleanups. Basically here are the important operations I have done: - Install Cassandra 2.1.7 on a cluster of 5 nodes with RF 3 using Sized-Tiered compaction. - Insert 2 billion rows. (bulk load) - Made loads of selects statements… Verified that the data is good. - Did some deletes and a bit more inserts. - Eventually migrated to 2.1.8 - Then only very few delete/inserts. - Did a few snapshots. When I was doing “nodetool status” I always got a load of about 200 GB on **all** nodes. - Then I did a “nodetool -h node0 repair -par -pr -inc” and after that I had a completely different picture. nodetool -h zennode0 status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 192.168.2.104 941.49 GB 256 ? c13e0858-091c-47c4-8773-6d6262723435 rack1 UN 192.168.2.100 1.07 TB256 ? c32a9357-e37e-452e-8eb1-57d86314b419 rack1 UN 192.168.2.101 189.72 GB 256 ? 9af90dea-90b3-4a8a-b88a-0aeabe3cea79 rack1 UN 192.168.2.102 948.61 GB 256 ? 8eb7a5bb-6903-4ae1-a372-5436d0cc170c rack1 UN 192.168.2.103 197.27 GB 256 ? 9efc6f13-2b02-4400-8cde-ae831feb86e9 rack1 Also, could you please do the nodetool status myks for your keyspace(s) ? We will then be able to know the theoretical ownership of each node on your distinct (or unique) keyspace(s) ? nodetool -h zennode0 status XYZdata Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns (effective) Host ID Rack UN 192.168.2.104 941.49 GB 256 62.5% c13e0858-091c-47c4-8773-6d6262723435 rack1 UN 192.168.2.100 1.07 TB256 58.4% c32a9357-e37e-452e-8eb1-57d86314b419 rack1 UN 192.168.2.101 189.72 GB 256 58.4% 9af90dea-90b3-4a8a-b88a-0aeabe3cea79 rack1 UN 192.168.2.102 948.61 GB 256 60.1% 8eb7a5bb-6903-4ae1-a372-5436d0cc170c rack1 UN 192.168.2.103 197.27 GB 256 60.6% 9efc6f13-2b02-4400-8cde-ae831feb86e9 rack1 Some ideas: You repaired only a primary range (-pr) of one node, with a RF of 3 and have 3 big nodes, if not using vnodes, this would be almost normal (excepted for the gap 200 GB -- 1 TB, this is huge, unless you messed up with RF). So are you using them ? My schema is totally fixed and I use RF 3 since the beginning. Sorry I’m not too aquinted with vnodes. I have not changed anything in the cassandra.yaml except the seeds and the name of the cluster. 2/ Load is barely the size of the data on each node If it is the size of the data how can it fit on the disk? My 5 nodes have an SSD drive of 1 TB and here is the disk usage for each of them: node0: 25% node1: 25% node2: 24% node3: 26% node4: 29% nodetool status says that the load for node0 is 1.07TB. That is more than fit of it’s disk, and the disk usage for node0 is 25%. This is not clear for me… the Load in nodetool status output seems to be more that “the size of the data on a node”. On 18 Aug 2015, at 19:29 , Alain RODRIGUEZ arodr...@gmail.commailto:arodr...@gmail.com wrote: Hi Jean, I might help, but I need to know what you have done recently (change the RF, Add remove node, cleanups, anything else as much as possible...) Also, could you please do the nodetool status myks for your keyspace(s) ? We will then be able to know the theoretical ownership of each node on your distinct (or unique) keyspace(s) ? Some ideas: You repaired only a primary range (-pr) of one node, with a RF of 3 and have 3 big nodes, if not using vnodes, this would be almost normal (excepted for the gap 200 GB -- 1 TB, this is huge, unless you messed up with RF). So are you using them ? Answers: 1/ It depends on what happen to this cluster (see my questions above) 2/ Load is barely the size of the data on each node 3/ No, this is not a normal nor stable situation. 4/ No, pr means you repaired only the partition that node is responsible for (depends on token), you have to run this on all nodes. But I would wait to find out first what's happening to avoid hitting the threshold on disk space or whatever. I guess I have been confused with the -par switch, which means to me that the work will be done in parallel and therefore will be done on all nodes. So if I understand right, one should do a “nodetool repair -par -pr -inc” on all nodes one after the other? Is this correct? I have a second cluster, a smaller one,
Nodetool repair with Load times 5
Hi, I have a phenomena I cannot explain, and I would like to understand. I’m running Cassandra 2.1.8 on a cluster of 5 nodes. I’m using replication factor 3, with most default settings. Last week I done a nodetool status which gave me on each node a load of about 200 GB. Since then there was no deletes no inserts. This weekend I did a nodetool -h 192.168.2.100 repair -pr -par -inc And now when I make a nodetool status I see completely a new picture!! nodetool -h zennode0 status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 192.168.2.104 940.73 GB 256 ? c13e0858-091c-47c4-8773-6d6262723435 rack1 UN 192.168.2.100 1.07 TB256 ? c32a9357-e37e-452e-8eb1-57d86314b419 rack1 UN 192.168.2.101 189.03 GB 256 ? 9af90dea-90b3-4a8a-b88a-0aeabe3cea79 rack1 UN 192.168.2.102 951.28 GB 256 ? 8eb7a5bb-6903-4ae1-a372-5436d0cc170c rack1 UN 192.168.2.103 196.54 GB 256 ? 9efc6f13-2b02-4400-8cde-ae831feb86e9 rack1 The nodes 192.168.2.101 and 103 are about what they were last week, but now the three other nodes have a load which is about 5 times bigger! 1) Is this normal? 2) What is the meaning of the column Load? 3) Is there anything to fix? Can I leave it like that? Strange I’m asking to fix after I did a *repair*. Thanks a lot for your help. Kind regards Jean
Re: Nodetool repair with Load times 5
Hey Jean, Did you try running a nodetool cleanup on all your nodes, perhaps one at a time? On Tue, Aug 18, 2015 at 3:59 AM, Jean Tremblay jean.tremb...@zen-innovations.com wrote: Hi, I have a phenomena I cannot explain, and I would like to understand. I’m running Cassandra 2.1.8 on a cluster of 5 nodes. I’m using replication factor 3, with most default settings. Last week I done a nodetool status which gave me on each node a load of about 200 GB. Since then there was no deletes no inserts. This weekend I did a nodetool -h 192.168.2.100 repair -pr -par -inc And now when I make a nodetool status I see completely a new picture!! nodetool -h zennode0 status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 192.168.2.104 940.73 GB 256 ? c13e0858-091c-47c4-8773-6d6262723435 rack1 UN 192.168.2.100 1.07 TB256 ? c32a9357-e37e-452e-8eb1-57d86314b419 rack1 UN 192.168.2.101 189.03 GB 256 ? 9af90dea-90b3-4a8a-b88a-0aeabe3cea79 rack1 UN 192.168.2.102 951.28 GB 256 ? 8eb7a5bb-6903-4ae1-a372-5436d0cc170c rack1 UN 192.168.2.103 196.54 GB 256 ? 9efc6f13-2b02-4400-8cde-ae831feb86e9 rack1 The nodes 192.168.2.101 and 103 are about what they were last week, but now the three other nodes have a load which is about 5 times bigger! 1) Is this normal? 2) What is the meaning of the column Load? 3) Is there anything to fix? Can I leave it like that? Strange I’m asking to fix after I did a *repair*. Thanks a lot for your help. Kind regards Jean
Re: Nodetool repair with Load times 5
No. I did not try. I would like to understand what is going on before I make my problem, maybe even worse. I really would like to understand: 1) Is this normal? 2) What is the meaning of the column Load? 3) Is there anything to fix? Can I leave it like that? 4) Did I do something wrong? When you use -par you only need to run repair from one node right? E.g. nodetool -h 192.168.2.100 repair -pr -par -inc Thanks for your feedback. Jean On 18 Aug 2015, at 14:33 , Mark Greene green...@gmail.commailto:green...@gmail.com wrote: Hey Jean, Did you try running a nodetool cleanup on all your nodes, perhaps one at a time? On Tue, Aug 18, 2015 at 3:59 AM, Jean Tremblay jean.tremb...@zen-innovations.commailto:jean.tremb...@zen-innovations.com wrote: Hi, I have a phenomena I cannot explain, and I would like to understand. I’m running Cassandra 2.1.8 on a cluster of 5 nodes. I’m using replication factor 3, with most default settings. Last week I done a nodetool status which gave me on each node a load of about 200 GB. Since then there was no deletes no inserts. This weekend I did a nodetool -h 192.168.2.100 repair -pr -par -inc And now when I make a nodetool status I see completely a new picture!! nodetool -h zennode0 status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 192.168.2.104 940.73 GB 256 ? c13e0858-091c-47c4-8773-6d6262723435 rack1 UN 192.168.2.100 1.07 TB256 ? c32a9357-e37e-452e-8eb1-57d86314b419 rack1 UN 192.168.2.101 189.03 GB 256 ? 9af90dea-90b3-4a8a-b88a-0aeabe3cea79 rack1 UN 192.168.2.102 951.28 GB 256 ? 8eb7a5bb-6903-4ae1-a372-5436d0cc170c rack1 UN 192.168.2.103 196.54 GB 256 ? 9efc6f13-2b02-4400-8cde-ae831feb86e9 rack1 The nodes 192.168.2.101 and 103 are about what they were last week, but now the three other nodes have a load which is about 5 times bigger! 1) Is this normal? 2) What is the meaning of the column Load? 3) Is there anything to fix? Can I leave it like that? Strange I’m asking to fix after I did a *repair*. Thanks a lot for your help. Kind regards Jean
Re: Nodetool repair with Load times 5
Hi Jean, I might help, but I need to know what you have done recently (change the RF, Add remove node, cleanups, anything else as much as possible...) Also, could you please do the nodetool status *myks* for your keyspace(s) ? We will then be able to know the theoretical ownership of each node on your distinct (or unique) keyspace(s) ? Some ideas: You repaired only a primary range (-pr) of one node, with a RF of 3 and have 3 big nodes, if not using vnodes, this would be almost normal (excepted for the gap 200 GB -- 1 TB, this is huge, unless you messed up with RF). So are you using them ? Answers: 1/ It depends on what happen to this cluster (see my questions above) 2/ Load is barely the size of the data on each node 3/ No, this is not a normal nor stable situation. 4/ No, pr means you repaired only the partition that node is responsible for (depends on token), you have to run this on all nodes. But I would wait to find out first what's happening to avoid hitting the threshold on disk space or whatever. Anyway, see if you can give us more info related to this. C*heers, Alain 2015-08-18 14:40 GMT+02:00 Jean Tremblay jean.tremb...@zen-innovations.com : No. I did not try. I would like to understand what is going on before I make my problem, maybe even worse. I really would like to understand: 1) Is this normal? 2) What is the meaning of the column Load? 3) Is there anything to fix? Can I leave it like that? 4) Did I do something wrong? When you use -par you only need to run repair from one node right? E.g. nodetool -h 192.168.2.100 repair -pr -par -inc Thanks for your feedback. Jean On 18 Aug 2015, at 14:33 , Mark Greene green...@gmail.com wrote: Hey Jean, Did you try running a nodetool cleanup on all your nodes, perhaps one at a time? On Tue, Aug 18, 2015 at 3:59 AM, Jean Tremblay jean.tremb...@zen-innovations.com wrote: Hi, I have a phenomena I cannot explain, and I would like to understand. I’m running Cassandra 2.1.8 on a cluster of 5 nodes. I’m using replication factor 3, with most default settings. Last week I done a nodetool status which gave me on each node a load of about 200 GB. Since then there was no deletes no inserts. This weekend I did a nodetool -h 192.168.2.100 repair -pr -par -inc And now when I make a nodetool status I see completely a new picture!! nodetool -h zennode0 status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens OwnsHost ID Rack UN 192.168.2.104 940.73 GB 256 ? c13e0858-091c-47c4-8773-6d6262723435 rack1 UN 192.168.2.100 1.07 TB256 ? c32a9357-e37e-452e-8eb1-57d86314b419 rack1 UN 192.168.2.101 189.03 GB 256 ? 9af90dea-90b3-4a8a-b88a-0aeabe3cea79 rack1 UN 192.168.2.102 951.28 GB 256 ? 8eb7a5bb-6903-4ae1-a372-5436d0cc170c rack1 UN 192.168.2.103 196.54 GB 256 ? 9efc6f13-2b02-4400-8cde-ae831feb86e9 rack1 The nodes 192.168.2.101 and 103 are about what they were last week, but now the three other nodes have a load which is about 5 times bigger! 1) Is this normal? 2) What is the meaning of the column Load? 3) Is there anything to fix? Can I leave it like that? Strange I’m asking to fix after I did a *repair*. Thanks a lot for your help. Kind regards Jean