Re: Restarting nodes and reported load

Daniel Steuernol Tue, 30 May 2017 14:31:06 -0700

My question is about cassandra, ultimately I'm trying to figure out why our clusters performance degrades approximately every 6 days. I noticed that the load as reported by nodetool status was very high, but that might be unrelated to the problem. A restart solves the performance problem.

I've attached a latency graph for inserts into the cluster as you can see over the weekend there was a massive latency spike, and it was fixed by a restart of all the nodes.

On May 30 2017, at 2:18 pm, Jonathan Haddad <j...@jonhaddad.com> wrote:

This isn't an HDFS mailing list.

On Tue, May 30, 2017 at 2:14 PM daemeon reiydelle <daeme...@gmail.com> wrote:
no, 3tb is small. 30-50tb of hdfs space is typical these days per hdfs node. Depends somewhat on whether there is a mix of more and less frequently accessed data. But even storing only hot data, never saw anything less than 20tb hdfs per node.

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

“All men dream, but not equally. Those who dream by night in the dusty recesses of their minds wake up in the day to find it was vanity, but the dreamers of the day are dangerous men, for they may act their dreams with open eyes, to make it possible.” — T.E. Lawrence

On Tue, May 30, 2017 at 2:00 PM, tommaso barbugli <tbarbu...@gmail.com> wrote:
Am I the only one thinking 3TB is way too much data for a single node on a VM?

On Tue, May 30, 2017 at 10:36 PM, Daniel Steuernol <dan...@sendwithus.com> wrote:
I don't believe incremental repair is enabled, I have never enabled it on the cluster, and unless it's the default then it is off. Also I don't see a setting in cassandra.yaml for it.

On May 30 2017, at 1:10 pm, daemeon reiydelle <daeme...@gmail.com> wrote:

Unless there is a bug, snapshots are excluded (they are not HDFS anyway!) from nodetool status.

Out of curiousity, is incremenatal repair enabled? This is almost certainly a rat hole, but there was an issue a few releases back where load would only increase until the node was restarted. Had been fixed ages ago, but wondering what happens if you restart a node, IF you have incremental enabled.

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

“All men dream, but not equally. Those who dream by night in the dusty recesses of their minds wake up in the day to find it was vanity, but the dreamers of the day are dangerous men, for they may act their dreams with open eyes, to make it possible.” — T.E. Lawrence

On Tue, May 30, 2017 at 12:15 PM, Varun Gupta <var...@uber.com> wrote:
Can you please check if you have incremental backup enabled and snapshots are occupying the space.

run nodetool clearsnapshot command.

On Tue, May 30, 2017 at 11:12 AM, Daniel Steuernol <dan...@sendwithus.com> wrote:
It's 3-4TB per node, and by load rises, I'm talking about load as reported by nodetool status.

On May 30 2017, at 10:25 am, daemeon reiydelle <daeme...@gmail.com> wrote:

When you say "the load rises ... ", could you clarify what you mean by "load"? That has a specific Linux term, and in e.g. Cloudera Manager. But in neither case would that be relevant to transient or persisted disk. Am I missing something?

On Tue, May 30, 2017 at 10:18 AM, tommaso barbugli <tbarbu...@gmail.com> wrote:
3-4 TB per node or in total?

On Tue, May 30, 2017 at 6:48 PM, Daniel Steuernol <dan...@sendwithus.com> wrote:
I should also mention that I am running cassandra 3.10 on the cluster

On May 29 2017, at 9:43 am, Daniel Steuernol <dan...@sendwithus.com> wrote:

The cluster is running with RF=3, right now each node is storing about 3-4 TB of data. I'm using r4.2xlarge EC2 instances, these have 8 vCPU's, 61 GB of RAM, and the disks attached for the data drive are gp2 ssd ebs volumes with 10k iops. I guess this brings up the question of what's a good marker to decide on whether to increase disk space vs provisioning a new node?

On May 29 2017, at 9:35 am, tommaso barbugli <tbarbu...@gmail.com> wrote:

Hi Daniel,

This is not normal. Possibly a capacity problem. Whats the RF, how much data do you store per node and what kind of servers do you use (core count, RAM, disk, ...)?

Cheers,
Tommaso

On Mon, May 29, 2017 at 6:22 PM, Daniel Steuernol <dan...@sendwithus.com> wrote:

I am running a 6 node cluster, and I have noticed that the reported load on each node rises throughout the week and grows way past the actual disk space used and available on each node. Also eventually latency for operations suffers and the nodes have to be restarted. A couple questions on this, is this normal? Also does cassandra need to be restarted every few days for best performance? Any insight on this behaviour would be helpful.

Cheers,
Daniel
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org

--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org

Screen Shot 2017-05-30 at 2.20.04 PM.png
Description: Binary data

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Restarting nodes and reported load

Reply via email to