Re: Restarting nodes and reported load

2017-06-02 Thread Daniel Steuernol
Thanks for the info, this provides a lot to go through, especially Al Tobey's guide.  I'm running java version "1.8.0_121" and using G1GC for the gc type.
  

On Jun 1 2017, at 2:32 pm, Victor Chen  wrote:


  Regarding mtime, I'm just talking about using something like the following (assuming you are on linux) "find pathtoyourdatadir -mtime -1 -ls" which will find all files in your datadir last modifed within the past 24h. You can compare increase in your reported nodetool load within the past N days and then use the same period of time to look for files modified that could match that size. Not really sure what sort of load or how long that would take on 3-4T of data though. Regarding compactionstats and tpstats, I would just be interested if there are increasing "pending" tasks for either. Did you say you observed latency issues or degraded performance or not? What version of java/cassandra did you say you were running and what type of gc are you using?Regarding not showing a node not creating "DOWN" entry in log, if a node experiences a sufficiently long gc pause (I'm not sure what the threshold is, maybe somebody more knowledgeable can chime in?), then even though the node itself still "thinks" it's up, other nodes will mark it as DN, thus you wouldn't see a "is now DOWN" entry in the system.log of the gc-ing node, but you would see a "is now DOWN" entry in the system.log of the remote nodes (and a corresponding "is now UP" entry when the node comes out of its gc pause. Assuming the logs have not been rotated off, if you just grep system.log for "DOWN" on your nodes, that usually reveals a useful timestamp from where to start looking on the problematic node's system.log or gc.log.Do you have peristent cpu/memory disk io/ space monitoring mechanisms? You should think about putting something in place to gathering that info if you don't ... I find myself coming back to Al Tobey's tuning guide frequently if nothing else for the tools he mentions and notes on the java gc. I want to say heap size of 15G sounds a little high but I am starting to talk a bit out of my depth when it comes to java tuning. see datastax's official cassandra 2.1 jvm tuning doc and also this stackoverflow thread. good luck!On Thu, Jun 1, 2017 at 4:06 PM, Daniel Steuernol  wrote:I'll try to capture answer to questions in the last 2 messages.Network traffic looks pretty steady overall. About 0.5 up to 2 megabytes/s. The cluster handles about 100k to 500k operations per minute, right now the read/write comparison is about 50/50 right now, eventually though it will probably be 70% writes and 30% reads.There does seem to be some nodes that are affected more frequently then others. I haven't captured cpu/memory stats vs other nodes at the time the problem is occurring, I will do that next time it happens. Also I will look at compaction stats and tpstats, what are some things that I should be looking for in tpstats in particular, I'm not exactly sure how to read the output from that command.The heap size is set to 15GB on each node, and each node has 60GB of ram available.In regards to the "... is now DOWN" messages. I'm unable to find one in the system.log for a time I know that a node was having problems. I've built a system that polls nodetool status and parses the output, and if it sees a node reporting as DN it sends a message to a slack channel. Is it possible for a node to report as DN, but not have the message show up in th log?The system polling nodetool status is not the status that was reported as DN.I'm a bit unclear about the last point about mtime/size of files and how to check, can you provide more information there?Thanks for the all the help, I really appreciate it.
  

On Jun 1 2017, at 10:33 am, Victor Chen  wrote:


  Hi Daniel,In my experience when a node
 shows DN and then comes back up by itself that sounds some sort of gc pause 
(especially if nodtool status when run from the "DN" node itself shows 
it is up-- assuming there isn't a spotty network issue). Perhaps I missed this info due to length of thread but have 
you shared info about the following?cpu/memory usage of affected nodes (are all nodes affected comparably, or some more than others?)nodetool compactionstats and tpstats output (especially as the )what is your heap size set to?system.log and gc.logs: for investigating node "DN" symptoms I
 will usually start by noting the timestamp of the "123.56.78.901 is now DOWN" 
entries in system.log of other nodes to tell me where to look in 
system.log of node in question. Then it's a question answer "what was 
this node doing up to that point?"mtime/size of files in data directory-- which files are growing in size? That will help reduce 
how much we need to speculate. I don't think you should need to restart 

Re: Restarting nodes and reported load

2017-06-01 Thread Victor Chen
Regarding mtime, I'm just talking about using something like the following
(assuming you are on linux) "find *pathtoyourdatadir *-mtime -1 -ls" which
will find all files in your datadir last modifed within the past 24h. You
can compare increase in your reported nodetool load within the past N days
and then use the same period of time to look for files modified that could
match that size. Not really sure what sort of load or how long that would
take on 3-4T of data though.

Regarding compactionstats and tpstats, I would just be interested if there
are increasing "pending" tasks for either. Did you say you observed latency
issues or degraded performance or not?

What version of java/cassandra did you say you were running and what type
of gc are you using?

Regarding not showing a node not creating "DOWN" entry in log, if a node
experiences a sufficiently long gc pause (I'm not sure what the threshold
is, maybe somebody more knowledgeable can chime in?), then even though the
node itself still "thinks" it's up, *other* nodes will mark it as DN, thus
you wouldn't see a "is now DOWN" entry in the system.log of the gc-ing
node, but you *would* see a "is now DOWN" entry in the system.log of the
remote nodes (and a corresponding "is now UP" entry when the node comes out
of its gc pause. Assuming the logs have not been rotated off, if you just
grep system.log for "DOWN" on your nodes, that usually reveals a useful
timestamp from where to start looking on the problematic node's system.log
or gc.log.

Do you have peristent cpu/memory disk io/ space monitoring mechanisms? You
should think about putting something in place to gathering that info if you
don't ... I find myself coming back to Al Tobey's tuning guide

frequently if nothing else for the tools he mentions and notes on the java
gc. I want to say heap size of 15G sounds a little high but I am starting
to talk a bit out of my depth when it comes to java tuning. see
datastax's official
cassandra 2.1 jvm tuning doc

and also this stackoverflow thread.


good luck!



On Thu, Jun 1, 2017 at 4:06 PM, Daniel Steuernol 
wrote:

> I'll try to capture answer to questions in the last 2 messages.
>
> Network traffic looks pretty steady overall. About 0.5 up to 2
> megabytes/s. The cluster handles about 100k to 500k operations per minute,
> right now the read/write comparison is about 50/50 right now, eventually
> though it will probably be 70% writes and 30% reads.
>
> There does seem to be some nodes that are affected more frequently then
> others. I haven't captured cpu/memory stats vs other nodes at the time the
> problem is occurring, I will do that next time it happens. Also I will look
> at compaction stats and tpstats, what are some things that I should be
> looking for in tpstats in particular, I'm not exactly sure how to read the
> output from that command.
>
> The heap size is set to 15GB on each node, and each node has 60GB of ram
> available.
>
> In regards to the "... is now DOWN" messages. I'm unable to find one in
> the system.log for a time I know that a node was having problems. I've
> built a system that polls nodetool status and parses the output, and if it
> sees a node reporting as DN it sends a message to a slack channel. Is it
> possible for a node to report as DN, but not have the message show up in th
> log?
> The system polling nodetool status is not the status that was reported as
> DN.
>
> I'm a bit unclear about the last point about mtime/size of files and how
> to check, can you provide more information there?
>
> Thanks for the all the help, I really appreciate it.
>
>
>
> On Jun 1 2017, at 10:33 am, Victor Chen  wrote:
>
>> Hi Daniel,
>>
>> In my experience when a node shows DN and then comes back up by itself
>> that sounds some sort of gc pause (especially if nodtool status when run
>> from the "DN" node itself shows it is up-- assuming there isn't a spotty
>> network issue). Perhaps I missed this info due to length of thread but have
>> you shared info about the following?
>>
>>- cpu/memory usage of affected nodes (are all nodes affected
>>comparably, or some more than others?)
>>- nodetool compactionstats and tpstats output (especially as the )
>>- what is your heap size set to?
>>- system.log and gc.logs: for investigating node "DN" symptoms I will
>>usually start by noting the timestamp of the "123.56.78.901 is now DOWN"
>>entries in system.log of other nodes to tell me where to look in 
>> system.log
>>of node in question. Then it's a question answer "what was this node doing
>>up to that point?"
>>- mtime/size of files in data directory-- which files are growing in
>>size?
>>
>> That will help reduce how much we 

Re: Restarting nodes and reported load

2017-06-01 Thread Daniel Steuernol
I'll try to capture answer to questions in the last 2 messages.Network traffic looks pretty steady overall. About 0.5 up to 2 megabytes/s. The cluster handles about 100k to 500k operations per minute, right now the read/write comparison is about 50/50 right now, eventually though it will probably be 70% writes and 30% reads.There does seem to be some nodes that are affected more frequently then others. I haven't captured cpu/memory stats vs other nodes at the time the problem is occurring, I will do that next time it happens. Also I will look at compaction stats and tpstats, what are some things that I should be looking for in tpstats in particular, I'm not exactly sure how to read the output from that command.The heap size is set to 15GB on each node, and each node has 60GB of ram available.In regards to the "... is now DOWN" messages. I'm unable to find one in the system.log for a time I know that a node was having problems. I've built a system that polls nodetool status and parses the output, and if it sees a node reporting as DN it sends a message to a slack channel. Is it possible for a node to report as DN, but not have the message show up in th log?The system polling nodetool status is not the status that was reported as DN.I'm a bit unclear about the last point about mtime/size of files and how to check, can you provide more information there?Thanks for the all the help, I really appreciate it.
  

On Jun 1 2017, at 10:33 am, Victor Chen  wrote:


  Hi Daniel,In my experience when a node
 shows DN and then comes back up by itself that sounds some sort of gc pause 
(especially if nodtool status when run from the "DN" node itself shows 
it is up-- assuming there isn't a spotty network issue). Perhaps I missed this info due to length of thread but have 
you shared info about the following?cpu/memory usage of affected nodes (are all nodes affected comparably, or some more than others?)nodetool compactionstats and tpstats output (especially as the )what is your heap size set to?system.log and gc.logs: for investigating node "DN" symptoms I
 will usually start by noting the timestamp of the "123.56.78.901 is now DOWN" 
entries in system.log of other nodes to tell me where to look in 
system.log of node in question. Then it's a question answer "what was 
this node doing up to that point?"mtime/size of files in data directory-- which files are growing in size? That will help reduce 
how much we need to speculate. I don't think you should need to restart cassandra every X days if things are optimally configured for your read/write pattern-- at least I would not want to use something where that is the normal expected behavior (and I don't believe cassandra is one of those sorts of things).On Thu, Jun 1, 2017 at 11:40 AM, daemeon reiydelle  wrote:Some random thoughts; I would like to thank you for giving us an interesting problem. Cassandra can get boring sometimes, it is too stable.- Do you have a way to monitor the network traffic to see if it is increasing between restarts or does it seem relatively flat?- What activities are happening when you observe the (increasing) latencies? Something must be writing to keyspaces, something I presume is reading. What is the workload?- when using SSD, there are some /devices optimizations for SSD's. I 
wonder if those were done (they will cause some IO latency, but not like
 this)Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872
On Thu, Jun 1, 2017 at 7:18 AM, Daniel Steuernol  wrote:I am just restarting cassandra. I'm not having any disk space issues I think, but we're having issues where operations have increased latency, and these are fixed by a restart. It seemed like the load reported by nodetool status might be helpful in understanding what is going wrong but I'm not sure. Another symptom is that nodes will report as DN in nodetool status and then come back up again just a minute later.I'm not really sure what to track to find out what exactly is going wrong on the cluster, so any insight or debugging techniques would be super helpful
  

On May 31 2017, at 5:07 pm, Anthony Grasso  wrote:


  Hi Daniel,When you say that the nodes have to be restarted, are you just restarting the Cassandra service or are you restarting the machine?How are you reclaiming disk space at the moment? Does disk space free up after the restart?Regarding storage on nodes, keep in mind the more data stored on a node, the longer some operations to maintain that data will take to complete. In addition, the more data that is on each node, the long it will take to stream data to other nodes. Whether it is replacing a down node or inserting a new node, having a large amount of data on each node will mean that it takes longer for a node to join 

Re: Restarting nodes and reported load

2017-06-01 Thread Victor Chen
Hi Daniel,

In my experience when a node shows DN and then comes back up by itself that
sounds some sort of gc pause (especially if nodtool status when run from
the "DN" node itself shows it is up-- assuming there isn't a spotty network
issue). Perhaps I missed this info due to length of thread but have you
shared info about the following?

   - cpu/memory usage of affected nodes (are all nodes affected comparably,
   or some more than others?)
   - nodetool compactionstats and tpstats output (especially as the )
   - what is your heap size set to?
   - system.log and gc.logs: for investigating node "DN" symptoms I will
   usually start by noting the timestamp of the "123.56.78.901 is now DOWN"
   entries in system.log of other nodes to tell me where to look in system.log
   of node in question. Then it's a question answer "what was this node doing
   up to that point?"
   - mtime/size of files in data directory-- which files are growing in
   size?

That will help reduce how much we need to speculate. I don't think you
should need to restart cassandra every X days if things are optimally
configured for your read/write pattern-- at least I would not want to use
something where that is the normal expected behavior (and I don't believe
cassandra is one of those sorts of things).

On Thu, Jun 1, 2017 at 11:40 AM, daemeon reiydelle 
wrote:

> Some random thoughts; I would like to thank you for giving us an
> interesting problem. Cassandra can get boring sometimes, it is too stable.
>
> - Do you have a way to monitor the network traffic to see if it is
> increasing between restarts or does it seem relatively flat?
> - What activities are happening when you observe the (increasing)
> latencies? Something must be writing to keyspaces, something I presume is
> reading. What is the workload?
> - when using SSD, there are some /devices optimizations for SSD's. I
> wonder if those were done (they will cause some IO latency, but not like
> this)
>
>
>
>
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <%28415%29%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
>
>
> On Thu, Jun 1, 2017 at 7:18 AM, Daniel Steuernol 
> wrote:
>
>> I am just restarting cassandra. I'm not having any disk space issues I
>> think, but we're having issues where operations have increased latency, and
>> these are fixed by a restart. It seemed like the load reported by nodetool
>> status might be helpful in understanding what is going wrong but I'm not
>> sure. Another symptom is that nodes will report as DN in nodetool status
>> and then come back up again just a minute later.
>>
>> I'm not really sure what to track to find out what exactly is going wrong
>> on the cluster, so any insight or debugging techniques would be super
>> helpful
>>
>>
>> On May 31 2017, at 5:07 pm, Anthony Grasso 
>> wrote:
>>
>>> Hi Daniel,
>>>
>>> When you say that the nodes have to be restarted, are you just
>>> restarting the Cassandra service or are you restarting the machine?
>>> How are you reclaiming disk space at the moment? Does disk space free up
>>> after the restart?
>>>
>>> Regarding storage on nodes, keep in mind the more data stored on a node,
>>> the longer some operations to maintain that data will take to complete. In
>>> addition, the more data that is on each node, the long it will take to
>>> stream data to other nodes. Whether it is replacing a down node or
>>> inserting a new node, having a large amount of data on each node will mean
>>> that it takes longer for a node to join the cluster if it is streaming the
>>> data.
>>>
>>> Kind regards,
>>> Anthony
>>>
>>> On 30 May 2017 at 02:43, Daniel Steuernol  wrote:
>>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli 
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol >> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? 

Re: Restarting nodes and reported load

2017-06-01 Thread daemeon reiydelle
Some random thoughts; I would like to thank you for giving us an
interesting problem. Cassandra can get boring sometimes, it is too stable.

- Do you have a way to monitor the network traffic to see if it is
increasing between restarts or does it seem relatively flat?
- What activities are happening when you observe the (increasing)
latencies? Something must be writing to keyspaces, something I presume is
reading. What is the workload?
- when using SSD, there are some /devices optimizations for SSD's. I wonder
if those were done (they will cause some IO latency, but not like this)







*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*



On Thu, Jun 1, 2017 at 7:18 AM, Daniel Steuernol 
wrote:

> I am just restarting cassandra. I'm not having any disk space issues I
> think, but we're having issues where operations have increased latency, and
> these are fixed by a restart. It seemed like the load reported by nodetool
> status might be helpful in understanding what is going wrong but I'm not
> sure. Another symptom is that nodes will report as DN in nodetool status
> and then come back up again just a minute later.
>
> I'm not really sure what to track to find out what exactly is going wrong
> on the cluster, so any insight or debugging techniques would be super
> helpful
>
>
> On May 31 2017, at 5:07 pm, Anthony Grasso 
> wrote:
>
>> Hi Daniel,
>>
>> When you say that the nodes have to be restarted, are you just restarting
>> the Cassandra service or are you restarting the machine?
>> How are you reclaiming disk space at the moment? Does disk space free up
>> after the restart?
>>
>> Regarding storage on nodes, keep in mind the more data stored on a node,
>> the longer some operations to maintain that data will take to complete. In
>> addition, the more data that is on each node, the long it will take to
>> stream data to other nodes. Whether it is replacing a down node or
>> inserting a new node, having a large amount of data on each node will mean
>> that it takes longer for a node to join the cluster if it is streaming the
>> data.
>>
>> Kind regards,
>> Anthony
>>
>> On 30 May 2017 at 02:43, Daniel Steuernol  wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org


Re: Restarting nodes and reported load

2017-06-01 Thread Daniel Steuernol
I am just restarting cassandra. I'm not having any disk space issues I think, but we're having issues where operations have increased latency, and these are fixed by a restart. It seemed like the load reported by nodetool status might be helpful in understanding what is going wrong but I'm not sure. Another symptom is that nodes will report as DN in nodetool status and then come back up again just a minute later.I'm not really sure what to track to find out what exactly is going wrong on the cluster, so any insight or debugging techniques would be super helpful
  

On May 31 2017, at 5:07 pm, Anthony Grasso  wrote:


  Hi Daniel,When you say that the nodes have to be restarted, are you just restarting the Cassandra service or are you restarting the machine?How are you reclaiming disk space at the moment? Does disk space free up after the restart?Regarding storage on nodes, keep in mind the more data stored on a node, the longer some operations to maintain that data will take to complete. In addition, the more data that is on each node, the long it will take to stream data to other nodes. Whether it is replacing a down node or inserting a new node, having a large amount of data on each node will mean that it takes longer for a node to join the cluster if it is streaming the data.Kind regards,AnthonyOn 30 May 2017 at 02:43, Daniel Steuernol  wrote:The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Restarting nodes and reported load

2017-05-31 Thread Anthony Grasso
Hi Daniel,

When you say that the nodes have to be restarted, are you just restarting
the Cassandra service or are you restarting the machine?
How are you reclaiming disk space at the moment? Does disk space free up
after the restart?

Regarding storage on nodes, keep in mind the more data stored on a node,
the longer some operations to maintain that data will take to complete. In
addition, the more data that is on each node, the long it will take to
stream data to other nodes. Whether it is replacing a down node or
inserting a new node, having a large amount of data on each node will mean
that it takes longer for a node to join the cluster if it is streaming the
data.

Kind regards,
Anthony

On 30 May 2017 at 02:43, Daniel Steuernol  wrote:

> The cluster is running with RF=3, right now each node is storing about 3-4
> TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB
> of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes
> with 10k iops. I guess this brings up the question of what's a good marker
> to decide on whether to increase disk space vs provisioning a new node?
>
>
>
> On May 29 2017, at 9:35 am, tommaso barbugli  wrote:
>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org
>


Re: Restarting nodes and reported load

2017-05-30 Thread Jonathan Haddad
You're the only one I see in the thread that's made any reference to HDFS.
The OP even noted that his question is about C*, not HDFS.

On Tue, May 30, 2017 at 2:59 PM daemeon reiydelle 
wrote:

> Did you notice that HDFS is the distributed file system used?
>
>
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
>
> *“All men dream, but not equally. Those who dream by night in the dusty
> recesses of their minds wake up in the day to find it was vanity, but the
> dreamers of the day are dangerous men, for they may act their dreams with
> open eyes, to make it possible.” — T.E. Lawrence*
>
>
> On Tue, May 30, 2017 at 2:18 PM, Jonathan Haddad 
> wrote:
>
>> This isn't an HDFS mailing list.
>>
>> On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle 
>> wrote:
>>
>>> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
>>> node. Depends somewhat on whether there is a mix of more and less
>>> frequently accessed data. But even storing only hot data, never saw
>>> anything less than 20tb hdfs per node.
>>>
>>>
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>
>>>
>>> *“All men dream, but not equally. Those who dream by night in the dusty
>>> recesses of their minds wake up in the day to find it was vanity, but the
>>> dreamers of the day are dangerous men, for they may act their dreams with
>>> open eyes, to make it possible.” — T.E. Lawrence*
>>>
>>>
>>> On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli 
>>> wrote:
>>>
 Am I the only one thinking 3TB is way too much data for a single node
 on a VM?

 On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <
 dan...@sendwithus.com> wrote:

> I don't believe incremental repair is enabled, I have never enabled it
> on the cluster, and unless it's the default then it is off. Also I don't
> see a setting in cassandra.yaml for it.
>
>
>
> On May 30 2017, at 1:10 pm, daemeon reiydelle 
> wrote:
>
>> Unless there is a bug, snapshots are excluded (they are not HDFS
>> anyway!) from nodetool status.
>>
>> Out of curiousity, is incremenatal repair enabled? This is almost
>> certainly a rat hole, but there was an issue a few releases back where 
>> load
>> would only increase until the node was restarted. Had been fixed ages 
>> ago,
>> but wondering what happens if you restart a node, IF you have incremental
>> enabled.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <+1%20415-501-0198>London (+44) (0) 20 8144 9872 
>> <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the
>> dusty recesses of their minds wake up in the day to find it was vanity, 
>> but
>> the dreamers of the day are dangerous men, for they may act their dreams
>> with open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta 
>> wrote:
>>
>> Can you please check if you have incremental backup enabled and
>> snapshots are occupying the space.
>>
>> run nodetool clearsnapshot command.
>>
>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
>> dan...@sendwithus.com> wrote:
>>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle 
>> wrote:
>>
>> When you say "the load rises ... ", could you clarify what you mean
>> by "load"? That has a specific Linux term, and in e.g. Cloudera Manager.
>> But in neither case would that be relevant to transient or persisted 
>> disk.
>> Am I missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <
>> tbarbu...@gmail.com> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <
>> dan...@sendwithus.com> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing
>> about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8
>> vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 
>> ssd
>> ebs volumes with 10k iops. I guess this brings up the question of what's 
>> a
>> good marker to decide on whether to increase disk space vs provisioning a
>> new node?
>>
>>
>> On May 29 2017, at 9:35 am, 

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
Did you notice that HDFS is the distributed file system used?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 2:18 PM, Jonathan Haddad  wrote:

> This isn't an HDFS mailing list.
>
> On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle 
> wrote:
>
>> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
>> node. Depends somewhat on whether there is a mix of more and less
>> frequently accessed data. But even storing only hot data, never saw
>> anything less than 20tb hdfs per node.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli 
>> wrote:
>>
>>> Am I the only one thinking 3TB is way too much data for a single node on
>>> a VM?
>>>
>>> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <
>>> dan...@sendwithus.com> wrote:
>>>
 I don't believe incremental repair is enabled, I have never enabled it
 on the cluster, and unless it's the default then it is off. Also I don't
 see a setting in cassandra.yaml for it.



 On May 30 2017, at 1:10 pm, daemeon reiydelle 
 wrote:

> Unless there is a bug, snapshots are excluded (they are not HDFS
> anyway!) from nodetool status.
>
> Out of curiousity, is incremenatal repair enabled? This is almost
> certainly a rat hole, but there was an issue a few releases back where 
> load
> would only increase until the node was restarted. Had been fixed ages ago,
> but wondering what happens if you restart a node, IF you have incremental
> enabled.
>
>
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
>
> *“All men dream, but not equally. Those who dream by night in the
> dusty recesses of their minds wake up in the day to find it was vanity, 
> but
> the dreamers of the day are dangerous men, for they may act their dreams
> with open eyes, to make it possible.” — T.E. Lawrence*
>
>
> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:
>
> Can you please check if you have incremental backup enabled and
> snapshots are occupying the space.
>
> run nodetool clearsnapshot command.
>
> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
> dan...@sendwithus.com> wrote:
>
> It's 3-4TB per node, and by load rises, I'm talking about load as
> reported by nodetool status.
>
>
>
> On May 30 2017, at 10:25 am, daemeon reiydelle 
> wrote:
>
> When you say "the load rises ... ", could you clarify what you mean by
> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
> in neither case would that be relevant to transient or persisted disk. Am 
> I
> missing something?
>
>
> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <
> tbarbu...@gmail.com> wrote:
>
> 3-4 TB per node or in total?
>
> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <
> dan...@sendwithus.com> wrote:
>
> I should also mention that I am running cassandra 3.10 on the cluster
>
>
>
> On May 29 2017, at 9:43 am, Daniel Steuernol 
> wrote:
>
> The cluster is running with RF=3, right now each node is storing about
> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 
> 61
> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
> volumes with 10k iops. I guess this brings up the question of what's a 
> good
> marker to decide on whether to increase disk space vs provisioning a new
> node?
>
>
> On May 29 2017, at 9:35 am, tommaso barbugli 
> wrote:
>
> Hi Daniel,
>
> This is not normal. Possibly a capacity problem. Whats the RF, how
> much data do you store per node and what kind of servers do you use (core
> count, RAM, disk, ...)?
>
> Cheers,
> Tommaso
>
> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <
> dan...@sendwithus.com> wrote:
>
>
> I am running a 6 

Re: Restarting nodes and reported load

2017-05-30 Thread Jonathan Haddad
Daniel - my comment wasn't to you, it was in response to Daemeon.

> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
node

Jon

On Tue, May 30, 2017 at 2:30 PM Daniel Steuernol 
wrote:

> My question is about cassandra, ultimately I'm trying to figure out why
> our clusters performance degrades approximately every 6 days. I noticed
> that the load as reported by nodetool status was very high, but that might
> be unrelated to the problem. A restart solves the performance problem.
>
> I've attached a latency graph for inserts into the cluster as you can see
> over the weekend there was a massive latency spike, and it was fixed by a
> restart of all the nodes.
>
> On May 30 2017, at 2:18 pm, Jonathan Haddad  wrote:
>
>> This isn't an HDFS mailing list.
>>
>> On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle 
>> wrote:
>>
>> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
>> node. Depends somewhat on whether there is a mix of more and less
>> frequently accessed data. But even storing only hot data, never saw
>> anything less than 20tb hdfs per node.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli 
>> wrote:
>>
>> Am I the only one thinking 3TB is way too much data for a single node on
>> a VM?
>>
>> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol > > wrote:
>>
>> I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>>
>>
>>
>> On May 30 2017, at 1:10 pm, daemeon reiydelle 
>> wrote:
>>
>> Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
>> from nodetool status.
>>
>> Out of curiousity, is incremenatal repair enabled? This is almost
>> certainly a rat hole, but there was an issue a few releases back where load
>> would only increase until the node was restarted. Had been fixed ages ago,
>> but wondering what happens if you restart a node, IF you have incremental
>> enabled.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:
>>
>> Can you please check if you have incremental backup enabled and snapshots
>> are occupying the space.
>>
>> run nodetool clearsnapshot command.
>>
>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol > > wrote:
>>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle 
>> wrote:
>>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past 

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
My question is about cassandra, ultimately I'm trying to figure out why our clusters performance degrades approximately every 6 days. I noticed that the load as reported by nodetool status was very high, but that might be unrelated to the problem. A restart solves the performance problem.I've attached a latency graph for inserts into the cluster as you can see over the weekend there was a massive latency spike, and it was fixed by a restart of all the nodes.
  

On May 30 2017, at 2:18 pm, Jonathan Haddad  wrote:


  This isn't an HDFS mailing list.On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle  wrote:no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs node. Depends somewhat on whether there is a mix of more and less frequently accessed data. But even storing only hot data, never saw anything less than 20tb hdfs per node.Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872“All men dream, but not equally. Those who dream by night in the dusty 
recesses of their minds wake up in the day to find it was vanity, but 
the dreamers of the day are dangerous men, for they may act their dreams
 with open eyes, to make it possible.” — T.E. Lawrence
On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli  wrote:Am I the only one thinking 3TB is way too much data for a single node on a VM?On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol  wrote:I don't believe incremental repair is enabled, I have never enabled it on the cluster, and unless it's the default then it is off. Also I don't see a setting in cassandra.yaml for it.
  

On May 30 2017, at 1:10 pm, daemeon reiydelle  wrote:


  Unless there is a bug, snapshots are excluded (they are not HDFS anyway!) from nodetool status. Out of curiousity, is incremenatal repair enabled? This is almost certainly a rat hole, but there was an issue a few releases back where load would only increase until the node was restarted. Had been fixed ages ago, but wondering what happens if you restart a node, IF you have incremental enabled.Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872“All men dream, but not equally. Those who dream by night in the dusty 
recesses of their minds wake up in the day to find it was vanity, but 
the dreamers of the day are dangerous men, for they may act their dreams
 with open eyes, to make it possible.” — T.E. Lawrence
On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:Can you please check if you have incremental backup enabled and snapshots are occupying the space.run nodetool clearsnapshot command.On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol  wrote:It's 3-4TB per node, and by load rises, I'm talking about load as reported by nodetool status.
  

On May 30 2017, at 10:25 am, daemeon reiydelle  wrote:


  When you say "the load rises ... ", could you clarify what you mean by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But in neither case would that be relevant to transient or persisted disk. Am I missing something?
On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli  wrote:3-4 TB per node or in total?On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol  wrote:I should also mention that I am running cassandra 3.10 on the cluster
  

On May 29 2017, at 9:43 am, Daniel Steuernol  wrote:


  The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this 

Re: Restarting nodes and reported load

2017-05-30 Thread Jonathan Haddad
This isn't an HDFS mailing list.

On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle 
wrote:

> no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
> node. Depends somewhat on whether there is a mix of more and less
> frequently accessed data. But even storing only hot data, never saw
> anything less than 20tb hdfs per node.
>
>
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
>
> *“All men dream, but not equally. Those who dream by night in the dusty
> recesses of their minds wake up in the day to find it was vanity, but the
> dreamers of the day are dangerous men, for they may act their dreams with
> open eyes, to make it possible.” — T.E. Lawrence*
>
>
> On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli 
> wrote:
>
>> Am I the only one thinking 3TB is way too much data for a single node on
>> a VM?
>>
>> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol > > wrote:
>>
>>> I don't believe incremental repair is enabled, I have never enabled it
>>> on the cluster, and unless it's the default then it is off. Also I don't
>>> see a setting in cassandra.yaml for it.
>>>
>>>
>>>
>>> On May 30 2017, at 1:10 pm, daemeon reiydelle 
>>> wrote:
>>>
 Unless there is a bug, snapshots are excluded (they are not HDFS
 anyway!) from nodetool status.

 Out of curiousity, is incremenatal repair enabled? This is almost
 certainly a rat hole, but there was an issue a few releases back where load
 would only increase until the node was restarted. Had been fixed ages ago,
 but wondering what happens if you restart a node, IF you have incremental
 enabled.





 *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
 (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*


 *“All men dream, but not equally. Those who dream by night in the dusty
 recesses of their minds wake up in the day to find it was vanity, but the
 dreamers of the day are dangerous men, for they may act their dreams with
 open eyes, to make it possible.” — T.E. Lawrence*


 On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:

 Can you please check if you have incremental backup enabled and
 snapshots are occupying the space.

 run nodetool clearsnapshot command.

 On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
 dan...@sendwithus.com> wrote:

 It's 3-4TB per node, and by load rises, I'm talking about load as
 reported by nodetool status.



 On May 30 2017, at 10:25 am, daemeon reiydelle 
 wrote:

 When you say "the load rises ... ", could you clarify what you mean by
 "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
 in neither case would that be relevant to transient or persisted disk. Am I
 missing something?


 On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli  wrote:

 3-4 TB per node or in total?

 On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <
 dan...@sendwithus.com> wrote:

 I should also mention that I am running cassandra 3.10 on the cluster



 On May 29 2017, at 9:43 am, Daniel Steuernol 
 wrote:

 The cluster is running with RF=3, right now each node is storing about
 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
 volumes with 10k iops. I guess this brings up the question of what's a good
 marker to decide on whether to increase disk space vs provisioning a new
 node?


 On May 29 2017, at 9:35 am, tommaso barbugli 
 wrote:

 Hi Daniel,

 This is not normal. Possibly a capacity problem. Whats the RF, how much
 data do you store per node and what kind of servers do you use (core count,
 RAM, disk, ...)?

 Cheers,
 Tommaso

 On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <
 dan...@sendwithus.com> wrote:


 I am running a 6 node cluster, and I have noticed that the reported
 load on each node rises throughout the week and grows way past the actual
 disk space used and available on each node. Also eventually latency for
 operations suffers and the nodes have to be restarted. A couple questions
 on this, is this normal? Also does cassandra need to be restarted every few
 days for best performance? Any insight on this behaviour would be helpful.

 Cheers,
 Daniel
 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
 additional commands, e-mail: 

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs
node. Depends somewhat on whether there is a mix of more and less
frequently accessed data. But even storing only hot data, never saw
anything less than 20tb hdfs per node.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli 
wrote:

> Am I the only one thinking 3TB is way too much data for a single node on a
> VM?
>
> On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol 
> wrote:
>
>> I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>>
>>
>>
>> On May 30 2017, at 1:10 pm, daemeon reiydelle 
>> wrote:
>>
>>> Unless there is a bug, snapshots are excluded (they are not HDFS
>>> anyway!) from nodetool status.
>>>
>>> Out of curiousity, is incremenatal repair enabled? This is almost
>>> certainly a rat hole, but there was an issue a few releases back where load
>>> would only increase until the node was restarted. Had been fixed ages ago,
>>> but wondering what happens if you restart a node, IF you have incremental
>>> enabled.
>>>
>>>
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>>
>>>
>>> *“All men dream, but not equally. Those who dream by night in the dusty
>>> recesses of their minds wake up in the day to find it was vanity, but the
>>> dreamers of the day are dangerous men, for they may act their dreams with
>>> open eyes, to make it possible.” — T.E. Lawrence*
>>>
>>>
>>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:
>>>
>>> Can you please check if you have incremental backup enabled and
>>> snapshots are occupying the space.
>>>
>>> run nodetool clearsnapshot command.
>>>
>>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <
>>> dan...@sendwithus.com> wrote:
>>>
>>> It's 3-4TB per node, and by load rises, I'm talking about load as
>>> reported by nodetool status.
>>>
>>>
>>>
>>> On May 30 2017, at 10:25 am, daemeon reiydelle 
>>> wrote:
>>>
>>> When you say "the load rises ... ", could you clarify what you mean by
>>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>>> in neither case would that be relevant to transient or persisted disk. Am I
>>> missing something?
>>>
>>>
>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>>> wrote:
>>>
>>> 3-4 TB per node or in total?
>>>
>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol >> > wrote:
>>>
>>> I should also mention that I am running cassandra 3.10 on the cluster
>>>
>>>
>>>
>>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>>> wrote:
>>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli 
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol >> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? Any insight on this behaviour would be helpful.
>>>
>>> Cheers,
>>> Daniel
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>


Re: Restarting nodes and reported load

2017-05-30 Thread tommaso barbugli
Am I the only one thinking 3TB is way too much data for a single node on a
VM?

On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol 
wrote:

> I don't believe incremental repair is enabled, I have never enabled it on
> the cluster, and unless it's the default then it is off. Also I don't see a
> setting in cassandra.yaml for it.
>
>
>
> On May 30 2017, at 1:10 pm, daemeon reiydelle  wrote:
>
>> Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
>> from nodetool status.
>>
>> Out of curiousity, is incremenatal repair enabled? This is almost
>> certainly a rat hole, but there was an issue a few releases back where load
>> would only increase until the node was restarted. Had been fixed ages ago,
>> but wondering what happens if you restart a node, IF you have incremental
>> enabled.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <+1%20415-501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:
>>
>> Can you please check if you have incremental backup enabled and snapshots
>> are occupying the space.
>>
>> run nodetool clearsnapshot command.
>>
>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol > > wrote:
>>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle 
>> wrote:
>>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>


Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
No degradation.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 1:54 PM, Daniel Steuernol 
wrote:

> That does sound like what's happening, did performance degrade as the
> reported load increased?
>
>
>
> On May 30 2017, at 1:52 pm, daemeon reiydelle  wrote:
>
>> OK, thanks.
>>
>> So there was a bug in a prior version of C*, symptoms were:
>>
>> Nodetool would show increasing load utilization over time. Stopping and
>> restarting C* nodes would reset the storage back to what one would expect
>> on that node, for a while, then it would creep upwards again, until the
>> node(s) are restarted, etc. FYI it ONLY occurred on an in-use system, etc.
>>
>> I know (double checked) that the problem was fixed a while back.
>> Wondering if it resurfaced?
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 1:36 PM, Daniel Steuernol 
>> wrote:
>>
>> I don't believe incremental repair is enabled, I have never enabled it on
>> the cluster, and unless it's the default then it is off. Also I don't see a
>> setting in cassandra.yaml for it.
>>
>>
>> On May 30 2017, at 1:10 pm, daemeon reiydelle 
>> wrote:
>>
>> Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
>> from nodetool status.
>>
>> Out of curiousity, is incremenatal repair enabled? This is almost
>> certainly a rat hole, but there was an issue a few releases back where load
>> would only increase until the node was restarted. Had been fixed ages ago,
>> but wondering what happens if you restart a node, IF you have incremental
>> enabled.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:
>>
>> Can you please check if you have incremental backup enabled and snapshots
>> are occupying the space.
>>
>> run nodetool clearsnapshot command.
>>
>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol > > wrote:
>>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle 
>> wrote:
>>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is 

Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
That does sound like what's happening, did performance degrade as the reported load increased?
  

On May 30 2017, at 1:52 pm, daemeon reiydelle  wrote:


  OK, thanks.So there was a bug in a prior version of C*, symptoms were:Nodetool would show increasing load utilization over time. Stopping and restarting C* nodes would reset the storage back to what one would expect on that node, for a while, then it would creep upwards again, until the node(s) are restarted, etc. FYI it ONLY occurred on an in-use system, etc.I know (double checked) that the problem was fixed a while back. Wondering if it resurfaced? Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872“All men dream, but not equally. Those who dream by night in the dusty 
recesses of their minds wake up in the day to find it was vanity, but 
the dreamers of the day are dangerous men, for they may act their dreams
 with open eyes, to make it possible.” — T.E. Lawrence
On Tue, May 30, 2017 at 1:36 PM, Daniel Steuernol  wrote:I don't believe incremental repair is enabled, I have never enabled it on the cluster, and unless it's the default then it is off. Also I don't see a setting in cassandra.yaml for it.
  

On May 30 2017, at 1:10 pm, daemeon reiydelle  wrote:


  Unless there is a bug, snapshots are excluded (they are not HDFS anyway!) from nodetool status. Out of curiousity, is incremenatal repair enabled? This is almost certainly a rat hole, but there was an issue a few releases back where load would only increase until the node was restarted. Had been fixed ages ago, but wondering what happens if you restart a node, IF you have incremental enabled.Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872“All men dream, but not equally. Those who dream by night in the dusty 
recesses of their minds wake up in the day to find it was vanity, but 
the dreamers of the day are dangerous men, for they may act their dreams
 with open eyes, to make it possible.” — T.E. Lawrence
On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:Can you please check if you have incremental backup enabled and snapshots are occupying the space.run nodetool clearsnapshot command.On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol  wrote:It's 3-4TB per node, and by load rises, I'm talking about load as reported by nodetool status.
  

On May 30 2017, at 10:25 am, daemeon reiydelle  wrote:


  When you say "the load rises ... ", could you clarify what you mean by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But in neither case would that be relevant to transient or persisted disk. Am I missing something?
On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli  wrote:3-4 TB per node or in total?On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol  wrote:I should also mention that I am running cassandra 3.10 on the cluster
  

On May 29 2017, at 9:43 am, Daniel Steuernol  wrote:


  The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

  




  

-
To unsubscribe, e-mail: 

Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
OK, thanks.

So there was a bug in a prior version of C*, symptoms were:

Nodetool would show increasing load utilization over time. Stopping and
restarting C* nodes would reset the storage back to what one would expect
on that node, for a while, then it would creep upwards again, until the
node(s) are restarted, etc. FYI it ONLY occurred on an in-use system, etc.

I know (double checked) that the problem was fixed a while back. Wondering
if it resurfaced?





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 1:36 PM, Daniel Steuernol 
wrote:

> I don't believe incremental repair is enabled, I have never enabled it on
> the cluster, and unless it's the default then it is off. Also I don't see a
> setting in cassandra.yaml for it.
>
>
> On May 30 2017, at 1:10 pm, daemeon reiydelle  wrote:
>
>> Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
>> from nodetool status.
>>
>> Out of curiousity, is incremenatal repair enabled? This is almost
>> certainly a rat hole, but there was an issue a few releases back where load
>> would only increase until the node was restarted. Had been fixed ages ago,
>> but wondering what happens if you restart a node, IF you have incremental
>> enabled.
>>
>>
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>>
>> *“All men dream, but not equally. Those who dream by night in the dusty
>> recesses of their minds wake up in the day to find it was vanity, but the
>> dreamers of the day are dangerous men, for they may act their dreams with
>> open eyes, to make it possible.” — T.E. Lawrence*
>>
>>
>> On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:
>>
>> Can you please check if you have incremental backup enabled and snapshots
>> are occupying the space.
>>
>> run nodetool clearsnapshot command.
>>
>> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol > > wrote:
>>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle 
>> wrote:
>>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>


Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
I don't believe incremental repair is enabled, I have never enabled it on the cluster, and unless it's the default then it is off. Also I don't see a setting in cassandra.yaml for it.
  

On May 30 2017, at 1:10 pm, daemeon reiydelle  wrote:


  Unless there is a bug, snapshots are excluded (they are not HDFS anyway!) from nodetool status. Out of curiousity, is incremenatal repair enabled? This is almost certainly a rat hole, but there was an issue a few releases back where load would only increase until the node was restarted. Had been fixed ages ago, but wondering what happens if you restart a node, IF you have incremental enabled.Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872“All men dream, but not equally. Those who dream by night in the dusty 
recesses of their minds wake up in the day to find it was vanity, but 
the dreamers of the day are dangerous men, for they may act their dreams
 with open eyes, to make it possible.” — T.E. Lawrence
On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:Can you please check if you have incremental backup enabled and snapshots are occupying the space.run nodetool clearsnapshot command.On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol  wrote:It's 3-4TB per node, and by load rises, I'm talking about load as reported by nodetool status.
  

On May 30 2017, at 10:25 am, daemeon reiydelle  wrote:


  When you say "the load rises ... ", could you clarify what you mean by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But in neither case would that be relevant to transient or persisted disk. Am I missing something?
On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli  wrote:3-4 TB per node or in total?On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol  wrote:I should also mention that I am running cassandra 3.10 on the cluster
  

On May 29 2017, at 9:43 am, Daniel Steuernol  wrote:


  The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

  




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org





  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
Unless there is a bug, snapshots are excluded (they are not HDFS anyway!)
from nodetool status.

Out of curiousity, is incremenatal repair enabled? This is almost certainly
a rat hole, but there was an issue a few releases back where load would
only increase until the node was restarted. Had been fixed ages ago, but
wondering what happens if you restart a node, IF you have incremental
enabled.





*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*


*“All men dream, but not equally. Those who dream by night in the dusty
recesses of their minds wake up in the day to find it was vanity, but the
dreamers of the day are dangerous men, for they may act their dreams with
open eyes, to make it possible.” — T.E. Lawrence*


On Tue, May 30, 2017 at 12:15 PM, Varun Gupta  wrote:

> Can you please check if you have incremental backup enabled and snapshots
> are occupying the space.
>
> run nodetool clearsnapshot command.
>
> On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol 
> wrote:
>
>> It's 3-4TB per node, and by load rises, I'm talking about load as
>> reported by nodetool status.
>>
>>
>>
>> On May 30 2017, at 10:25 am, daemeon reiydelle 
>> wrote:
>>
>>> When you say "the load rises ... ", could you clarify what you mean by
>>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>>> in neither case would that be relevant to transient or persisted disk. Am I
>>> missing something?
>>>
>>>
>>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>>> wrote:
>>>
>>> 3-4 TB per node or in total?
>>>
>>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol >> > wrote:
>>>
>>> I should also mention that I am running cassandra 3.10 on the cluster
>>>
>>>
>>>
>>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>>> wrote:
>>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli 
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol >> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? Any insight on this behaviour would be helpful.
>>>
>>> Cheers,
>>> Daniel
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>>>
>>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>
>


Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
incremental backup is set to false in the config file, also I have set  snapshot_before_compaction and auto_snapshot to false as well. I ran nodetool clearsnapshot, but before doing that I ran nodetool listsnapshots and it listed a bunch of snapshots. I would have expected that to be empty because I've disabled auto_snapshot.
  

On May 30 2017, at 12:15 pm, Varun Gupta  wrote:


  Can you please check if you have incremental backup enabled and snapshots are occupying the space.run nodetool clearsnapshot command.On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol  wrote:It's 3-4TB per node, and by load rises, I'm talking about load as reported by nodetool status.
  

On May 30 2017, at 10:25 am, daemeon reiydelle  wrote:


  When you say "the load rises ... ", could you clarify what you mean by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But in neither case would that be relevant to transient or persisted disk. Am I missing something?
On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli  wrote:3-4 TB per node or in total?On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol  wrote:I should also mention that I am running cassandra 3.10 on the cluster
  

On May 29 2017, at 9:43 am, Daniel Steuernol  wrote:


  The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

  




  

-
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Restarting nodes and reported load

2017-05-30 Thread Varun Gupta
Can you please check if you have incremental backup enabled and snapshots
are occupying the space.

run nodetool clearsnapshot command.

On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol 
wrote:

> It's 3-4TB per node, and by load rises, I'm talking about load as reported
> by nodetool status.
>
>
>
> On May 30 2017, at 10:25 am, daemeon reiydelle 
> wrote:
>
>> When you say "the load rises ... ", could you clarify what you mean by
>> "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
>> in neither case would that be relevant to transient or persisted disk. Am I
>> missing something?
>>
>>
>> On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
>> wrote:
>>
>> 3-4 TB per node or in total?
>>
>> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
>> wrote:
>>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>
>> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org
>


Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
It's 3-4TB per node, and by load rises, I'm talking about load as reported by nodetool status.
  

On May 30 2017, at 10:25 am, daemeon reiydelle  wrote:


  When you say "the load rises ... ", could you clarify what you mean by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But in neither case would that be relevant to transient or persisted disk. Am I missing something?
On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli  wrote:3-4 TB per node or in total?On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol  wrote:I should also mention that I am running cassandra 3.10 on the cluster
  

On May 29 2017, at 9:43 am, Daniel Steuernol  wrote:


  The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

  




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Restarting nodes and reported load

2017-05-30 Thread daemeon reiydelle
When you say "the load rises ... ", could you clarify what you mean by
"load"? That has a specific Linux term, and in e.g. Cloudera Manager. But
in neither case would that be relevant to transient or persisted disk. Am I
missing something?


On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli 
wrote:

> 3-4 TB per node or in total?
>
> On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
> wrote:
>
>> I should also mention that I am running cassandra 3.10 on the cluster
>>
>>
>>
>> On May 29 2017, at 9:43 am, Daniel Steuernol 
>> wrote:
>>
>>> The cluster is running with RF=3, right now each node is storing about
>>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>>> volumes with 10k iops. I guess this brings up the question of what's a good
>>> marker to decide on whether to increase disk space vs provisioning a new
>>> node?
>>>
>>>
>>> On May 29 2017, at 9:35 am, tommaso barbugli 
>>> wrote:
>>>
>>> Hi Daniel,
>>>
>>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>>> data do you store per node and what kind of servers do you use (core count,
>>> RAM, disk, ...)?
>>>
>>> Cheers,
>>> Tommaso
>>>
>>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol >> > wrote:
>>>
>>>
>>> I am running a 6 node cluster, and I have noticed that the reported load
>>> on each node rises throughout the week and grows way past the actual disk
>>> space used and available on each node. Also eventually latency for
>>> operations suffers and the nodes have to be restarted. A couple questions
>>> on this, is this normal? Also does cassandra need to be restarted every few
>>> days for best performance? Any insight on this behaviour would be helpful.
>>>
>>> Cheers,
>>> Daniel
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>>> additional commands, e-mail: user-h...@cassandra.apache.org
>>>
>>>
>>>
>


Re: Restarting nodes and reported load

2017-05-30 Thread tommaso barbugli
3-4 TB per node or in total?

On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol 
wrote:

> I should also mention that I am running cassandra 3.10 on the cluster
>
>
>
> On May 29 2017, at 9:43 am, Daniel Steuernol 
> wrote:
>
>> The cluster is running with RF=3, right now each node is storing about
>> 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61
>> GB of RAM, and the disks attached for the data drive are gp2 ssd ebs
>> volumes with 10k iops. I guess this brings up the question of what's a good
>> marker to decide on whether to increase disk space vs provisioning a new
>> node?
>>
>>
>> On May 29 2017, at 9:35 am, tommaso barbugli 
>> wrote:
>>
>> Hi Daniel,
>>
>> This is not normal. Possibly a capacity problem. Whats the RF, how much
>> data do you store per node and what kind of servers do you use (core count,
>> RAM, disk, ...)?
>>
>> Cheers,
>> Tommaso
>>
>> On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
>> wrote:
>>
>>
>> I am running a 6 node cluster, and I have noticed that the reported load
>> on each node rises throughout the week and grows way past the actual disk
>> space used and available on each node. Also eventually latency for
>> operations suffers and the nodes have to be restarted. A couple questions
>> on this, is this normal? Also does cassandra need to be restarted every few
>> days for best performance? Any insight on this behaviour would be helpful.
>>
>> Cheers,
>> Daniel
>> - To
>> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For
>> additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>


Re: Restarting nodes and reported load

2017-05-30 Thread Daniel Steuernol
I should also mention that I am running cassandra 3.10 on the cluster
  

On May 29 2017, at 9:43 am, Daniel Steuernol  wrote:


  The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Restarting nodes and reported load

2017-05-29 Thread Daniel Steuernol
The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?
  

On May 29 2017, at 9:35 am, tommaso barbugli  wrote:


  Hi Daniel,This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?Cheers,TommasoOn Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol  wrote:I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.Cheers,Daniel

-
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




  

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Restarting nodes and reported load

2017-05-29 Thread tommaso barbugli
Hi Daniel,

This is not normal. Possibly a capacity problem. Whats the RF, how much
data do you store per node and what kind of servers do you use (core count,
RAM, disk, ...)?

Cheers,
Tommaso

On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol 
wrote:

>
> I am running a 6 node cluster, and I have noticed that the reported load
> on each node rises throughout the week and grows way past the actual disk
> space used and available on each node. Also eventually latency for
> operations suffers and the nodes have to be restarted. A couple questions
> on this, is this normal? Also does cassandra need to be restarted every few
> days for best performance? Any insight on this behaviour would be helpful.
>
> Cheers,
> Daniel
> - To
> unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional
> commands, e-mail: user-h...@cassandra.apache.org