Re: Restarting nodes and reported load

2017-06-02 Thread Daniel Steuernol
Thanks for the info, this provides a lot to go through, especially Al Tobey's guide.  I'm running java version "1.8.0_121" and using G1GC for the gc type. On Jun 1 2017, at 2:32 pm, Victor Chen wrote:

Re: Restarting nodes and reported load

2017-06-01 Thread Victor Chen
Regarding mtime, I'm just talking about using something like the following (assuming you are on linux) "find *pathtoyourdatadir *-mtime -1 -ls" which will find all files in your datadir last modifed within the past 24h. You can compare increase in your reported nodetool load within the past N days

Re: Restarting nodes and reported load

2017-06-01 Thread Daniel Steuernol
I'll try to capture answer to questions in the last 2 messages.Network traffic looks pretty steady overall. About 0.5 up to 2 megabytes/s. The cluster handles about 100k to 500k operations per minute, right now the read/write comparison is about 50/50 right now, eventually though it will probably

Re: Restarting nodes and reported load

2017-06-01 Thread Victor Chen
Hi Daniel, In my experience when a node shows DN and then comes back up by itself that sounds some sort of gc pause (especially if nodtool status when run from the "DN" node itself shows it is up-- assuming there isn't a spotty network issue). Perhaps I missed this info due to length of thread

Re: Restarting nodes and reported load

2017-06-01 Thread daemeon reiydelle
Some random thoughts; I would like to thank you for giving us an interesting problem. Cassandra can get boring sometimes, it is too stable. - Do you have a way to monitor the network traffic to see if it is increasing between restarts or does it seem relatively flat? - What activities are

Re: Restarting nodes and reported load

2017-06-01 Thread Daniel Steuernol
I am just restarting cassandra. I'm not having any disk space issues I think, but we're having issues where operations have increased latency, and these are fixed by a restart. It seemed like the load reported by nodetool status might be helpful in understanding what is going wrong but I'm not

Re: Restarting nodes and reported load

2017-05-31 Thread Anthony Grasso
Hi Daniel, When you say that the nodes have to be restarted, are you just restarting the Cassandra service or are you restarting the machine? How are you reclaiming disk space at the moment? Does disk space free up after the restart? Regarding storage on nodes, keep in mind the more data stored

Re: Restarting nodes and reported load

2017-05-30 Thread Jonathan Haddad
You're the only one I see in the thread that's made any reference to HDFS. The OP even noted that his question is about C*, not HDFS. On Tue, May 30, 2017 at 2:59 PM daemeon reiydelle wrote: > Did you notice that HDFS is the distributed file system used? > > > > > > *Daemeon

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
Did you notice that HDFS is the distributed file system used? *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* *“All men dream, but not equally. Those who dream by night in the dusty recesses of their minds wake up in the day to find it was vanity, but the dreamers

Re: Restarting nodes and reported load

2017-05-30 Thread Jonathan Haddad
Daniel - my comment wasn't to you, it was in response to Daemeon. > no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs node Jon On Tue, May 30, 2017 at 2:30 PM Daniel Steuernol wrote: > My question is about cassandra, ultimately I'm trying to figure

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
My question is about cassandra, ultimately I'm trying to figure out why our clusters performance degrades approximately every 6 days. I noticed that the load as reported by nodetool status was very high, but that might be unrelated to the problem. A restart solves the performance problem.I've

Re: Restarting nodes and reported load

2017-05-30 Thread Jonathan Haddad
This isn't an HDFS mailing list. On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle wrote: > no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs > node. Depends somewhat on whether there is a mix of more and less > frequently accessed data. But even storing

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs node. Depends somewhat on whether there is a mix of more and less frequently accessed data. But even storing only hot data, never saw anything less than 20tb hdfs per node. *Daemeon C.M. ReiydelleUSA (+1)

Re: Restarting nodes and reported load

2017-05-30 Thread tommaso barbugli
Am I the only one thinking 3TB is way too much data for a single node on a VM? On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol wrote: > I don't believe incremental repair is enabled, I have never enabled it on > the cluster, and unless it's the default then it is off.

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
No degradation. *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* *“All men dream, but not equally. Those who dream by night in the dusty recesses of their minds wake up in the day to find it was vanity, but the dreamers of the day are dangerous men, for they may

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
That does sound like what's happening, did performance degrade as the reported load increased? On May 30 2017, at 1:52 pm, daemeon reiydelle wrote: OK, thanks.So there was a bug in a prior version of

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
OK, thanks. So there was a bug in a prior version of C*, symptoms were: Nodetool would show increasing load utilization over time. Stopping and restarting C* nodes would reset the storage back to what one would expect on that node, for a while, then it would creep upwards again, until the

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
I don't believe incremental repair is enabled, I have never enabled it on the cluster, and unless it's the default then it is off. Also I don't see a setting in cassandra.yaml for it. On May 30 2017, at 1:10 pm, daemeon reiydelle wrote:

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
Unless there is a bug, snapshots are excluded (they are not HDFS anyway!) from nodetool status. Out of curiousity, is incremenatal repair enabled? This is almost certainly a rat hole, but there was an issue a few releases back where load would only increase until the node was restarted. Had been

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
incremental backup is set to false in the config file, also I have set  snapshot_before_compaction and auto_snapshot to false as well. I ran nodetool clearsnapshot, but before doing that I ran nodetool listsnapshots and it listed a bunch of snapshots. I would have expected that to be empty because

Re: Restarting nodes and reported load

2017-05-30 Thread Varun Gupta
Can you please check if you have incremental backup enabled and snapshots are occupying the space. run nodetool clearsnapshot command. On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol wrote: > It's 3-4TB per node, and by load rises, I'm talking about load as reported >

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
It's 3-4TB per node, and by load rises, I'm talking about load as reported by nodetool status. On May 30 2017, at 10:25 am, daemeon reiydelle wrote: When you say "the load rises ... ", could you

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
When you say "the load rises ... ", could you clarify what you mean by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But in neither case would that be relevant to transient or persisted disk. Am I missing something? On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli

Re: Restarting nodes and reported load

2017-05-30 Thread tommaso barbugli
3-4 TB per node or in total? On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol wrote: > I should also mention that I am running cassandra 3.10 on the cluster > > > > On May 29 2017, at 9:43 am, Daniel Steuernol > wrote: > >> The cluster is running

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
I should also mention that I am running cassandra 3.10 on the cluster On May 29 2017, at 9:43 am, Daniel Steuernol wrote: The cluster is running with RF=3, right now each node is storing about 3-4

Re: Restarting nodes and reported load

2017-05-29 Thread Daniel Steuernol
The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker

Re: Restarting nodes and reported load

2017-05-29 Thread tommaso barbugli
Hi Daniel, This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)? Cheers, Tommaso On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol wrote: > > I am running a 6