subject:"Cassandra Node keep going down"

Re: Cassandra Node keep going down

2017-07-17 Thread Jeff Jirsa



On 2017-07-14 11:23 (-0700), "Harika Vangapelli -T (hvangape - AKRAYA INC at 
Cisco)"
 wrote: 
> We are using Cassandra 3.x version..
> 

Which 3.x version? 3.11.0? 3.0.14? 3.7? Exact version is important. 

> Recently, our production database is going through some instability issues. 
> One of our node is keep going down from every 2 days up to a few of times a 
> day. The node is down due to JVM out of memory. According to my 
> investigation, I suspect that this might be related to the writing and/or 
> running compaction of the large partitions for some of our large data tables. 
> Here's might be what had happened
> 1. The node went OOM due to unable to de-serialize or compacting some large 
> partitions under some condition due to memory constrains.
> 2. Once we re-started it, which was usually a few hours later, the other 
> nodes in the cluster were trying to perform the hinted handoff to the down 
> node to patch the missing data. From now on, the down node would have to 
> handle handoff plus the normal data load, which made it even busier.
> 3. The node was not able to complete the handoff and went down again.
> 4. This went again and again.
> 

Sounds like it's always the same node? You may want to try running 'nodetool 
scrub' on that node and watching logs for errors that may indicate a corrupt 
file on disk, which would cause the behavior you're seeing.


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Cassandra Node keep going down

2017-07-14 Thread Harika Vangapelli -T (hvangape - AKRAYA INC at Cisco)

We are using Cassandra 3.x version..

Recently, our production database is going through some instability issues. One
of our node is keep going down from every 2 days up to a few of times a day.
The node is down due to JVM out of memory. According to my investigation, I
suspect that this might be related to the writing and/or running compaction of
the large partitions for some of our large data tables. Here's might be what
had happened
1. The node went OOM due to unable to de-serialize or compacting some large
partitions under some condition due to memory constrains.
2. Once we re-started it, which was usually a few hours later, the other nodes
in the cluster were trying to perform the hinted handoff to the down node to
patch the missing data. From now on, the down node would have to handle handoff
plus the normal data load, which made it even busier.
3. The node was not able to complete the handoff and went down again.
4. This went again and again.

This was not the first time we're seeing this issue. The last time, we fixed
the issue by manually stopping some of aggregation jobs for a whole night to
allow the node to complete the handoff. We're not too sure about the root cause
yet, and we don't have explanation why this happens only to one node. I
investigated the issue and found two related JIRAs of Cassandra
https://issues.apache.org/jira/browse/CASSANDRA-8269 and
https://issues.apache.org/jira/browse/CASSANDRA-8723

Both JIRA mentioned that this might only be the case with Cassandra 2.x.

Thanks,

Harika

[http://wwwin.cisco.com/c/dam/cec/organizations/gmcc/services-tools/signaturetool/images/logo/logo_gradient.png]

Harika Vangapelli
Engineer - IT
hvang...@cisco.com
Tel:

Cisco Systems, Inc.

United States
cisco.com

[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif]Think before you
print.

This email may contain confidential and privileged material for the sole use of
the intended recipient. Any review, use, distribution or disclosure by others
is strictly prohibited. If you are not the intended recipient (or authorized to
receive for the recipient), please contact the sender by reply email and delete
all copies of this message.
Please click
here for
Company Registration Information.

Re: Cassandra Node keep going down

Cassandra Node keep going down

2 matches

Site Navigation

Mail list logo

Footer information