Regular NullPointerExceptions from `nodetool compactionstats` on 3.7 node

2018-04-20 Thread Paul Pollack
Hi all, We have a cluster running on Cassandra 3.7 (we already know this is considered a "bad" version and plan to upgrade to 3.11 in the not-too-distant future) and we have a few Nagios checks that run `nodetool compactionstats` to check how many pending compactions there currently are, as well

Re: What is a node's "counter ID?"

2017-10-23 Thread Paul Pollack
counters in 2.1, and the assignment of the id would > basically be a format migration. > > > On Oct 20, 2017, at 9:57 AM, Paul Pollack <paul.poll...@klaviyo.com> > wrote: > > Hi, > > I was reading the doc page for nodetool cleanup > https://docs.datastax.com/en/cassandra

What is a node's "counter ID?"

2017-10-20 Thread Paul Pollack
Hi, I was reading the doc page for nodetool cleanup https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCleanup.html because I was planning to run it after replacing a node in my counter cluster and the sentence "Cassandra assigns a new counter ID to the node" gave me pause. I can't

Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-21 Thread Paul Pollack
to replace that node. On Thu, Sep 21, 2017 at 7:58 AM, Paul Pollack <paul.poll...@klaviyo.com> wrote: > Thanks for the suggestions guys. > > Nicolas, I just checked nodetool listsnapshots and it doesn't seem like > those are causing the increase: > > Snapshot

Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-21 Thread Paul Pollack
Thanks for the suggestions guys. Nicolas, I just checked nodetool listsnapshots and it doesn't seem like those are causing the increase: Snapshot Details: Snapshot nameKeyspace name Column family name True size Size on disk

Re: Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread Paul Pollack
Just a quick additional note -- we have checked and this is the only node in the cluster exhibiting this behavior, disk usage is steady on all the others. CPU load on the repairing node is slightly higher but nothing significant. On Wed, Sep 20, 2017 at 9:08 PM, Paul Pollack <paul.p

Drastic increase in disk usage after starting repair on 3.7

2017-09-20 Thread Paul Pollack
Hi, I'm running a repair on a node in my 3.7 cluster and today got alerted on disk space usage. We keep the data and commit log directories on separate EBS volumes. The data volume is 2TB. The node went down due to EBS failure on the commit log drive. I stopped the instance and was later told by

Question about counters read before write behavior

2017-09-17 Thread Paul Pollack
Hi, We're trying to confirm on a counter write that the entire partition is read from disk vs. just the row and column of the partition to increment. We've trade the code to this line

Re: Bootstrapping node on Cassandra 3.7 causes cluster-wide performance issues

2017-09-11 Thread Paul Pollack
Thanks again guys, this has been a major blocker for us and I think we've made some major progress with your advice. We have gone ahead with Lerh's suggestion and the cluster is operating much more smoothly while the new node compacts. We read at quorum, so in the event that we don't make it

Re: Bootstrapping node on Cassandra 3.7 causes cluster-wide performance issues

2017-09-11 Thread Paul Pollack
Thanks for the responses Lerh and Kurt! Lerh - We had been considering those particular nodetool commands but were hesitant to perform them on a production node without either testing adequately in a dev environment or getting some feedback from someone who knew what they were doing (such as

Bootstrapping node on Cassandra 3.7 causes cluster-wide performance issues

2017-09-11 Thread Paul Pollack
Hi, We run 48 node cluster that stores counts in wide rows. Each node is using roughly 1TB space on a 2TB EBS gp2 drive for data directory and LeveledCompactionStrategy. We have been trying to bootstrap new nodes that use a raid0 configuration over 2 1TB EBS drives to increase I/O throughput cap

Re: Cassandra 3.7 repair error messages

2017-09-11 Thread Paul Pollack
ght need to > configure TCP keep_alive. 33 hours sounds like a really long time. Have you > successfully run a repair on this cluster before? > > On Thu, Aug 31, 2017 at 11:39 AM, Paul Pollack <paul.poll...@klaviyo.com> > wrote: > >> Hi, >> >> I'm trying to run a

Cassandra 3.7 repair error messages

2017-08-30 Thread Paul Pollack
Hi, I'm trying to run a repair on a node my Cassandra cluster, version 3.7, and was hoping someone may be able to shed light on an error message that keeps cropping up. I started the repair on a node after discovering that it somehow became partitioned from the rest of the cluster, e.g. nodetool