Re: TWCS sstables not dropping even though all data is expired

2019-05-07 Thread Mike Torra
’ll compact > away most of the other data in those old sstables (but not the partition > that’s been manually updated) > > Also table level TTLs help catch this type of manual manipulation - > consider adding it if appropriate. > > -- > Jeff Jirsa > > > On May 6, 2

Re: TWCS sstables not dropping even though all data is expired

2019-05-06 Thread Mike Torra
> > On May 3, 2019, at 7:57 PM, Nick Hatfield > wrote: > > Hi Mike, > > > > If you will, share your compaction settings. More than likely, your issue > is from 1 of 2 reasons: > 1. You have read repair chance set to anything other than 0 > > 2. You’re running r

Re: TWCS sstables not dropping even though all data is expired

2019-05-03 Thread Mike Torra
ou did the major compaction. > > This would happen on all replicas of the data, hence the reason you this > problem on 3 nodes. > > Thanks > > Paul > www.redshots.com > > On 3 May 2019, at 15:35, Mike Torra wrote: > > This does indeed seem to be a problem of overla

Re: TWCS sstables not dropping even though all data is expired

2019-05-03 Thread Mike Torra
WCS-part1.html the sections > towards the bottom of this post may well explain why the sstable is not > being deleted. > > Thanks > > Paul > www.redshots.com > > On 2 May 2019, at 16:08, Mike Torra wrote: > > I'm pretty stumped by this, so here is some more detail

Re: TWCS sstables not dropping even though all data is expired

2019-05-02 Thread Mike Torra
_info" : { "local_delete_time" : "2019-01-22T17:59:35Z" } } ] } ] } ``` As expected, almost all of the data except this one suspicious partition has a ttl and is already expired. But if a partition isn't expired and I see it in the sst

TWCS sstables not dropping even though all data is expired

2019-04-30 Thread Mike Torra
Hello - I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few months ago I started noticing disk usage on some nodes increasing consistently. At first I solved the problem by destroying the nodes and rebuilding them, but the problem returns. I did some more investigation

nodejs client can't connect to two nodes with different private ip addresses in different dcs

2018-11-29 Thread Mike Torra
Hi Guys - I recently ran in to a problem (for the 2nd time) where my nodejs app for some reason refuses to connect to one node in my C* cluster. I noticed that in both cases, the node that was not receiving any client connections had the same private ip as another node in the cluster, but in a

Re: high latency on one node after replacement

2018-03-27 Thread Mike Torra
? > > On Tue, Mar 27, 2018 at 2:24 PM, Mike Torra <mto...@salesforce.com> wrote: > >> Hi There - >> >> I have noticed an issue where I consistently see high p999 read latency >> on a node for a few hours after replacing the node. Before replacing the

high latency on one node after replacement

2018-03-27 Thread Mike Torra
Hi There - I have noticed an issue where I consistently see high p999 read latency on a node for a few hours after replacing the node. Before replacing the node, the p999 read latency is ~30ms, but after it increases to 1-5s. I am running C* 3.11.2 in EC2. I am testing out using EBS snapshots of

Re: node restart causes application latency

2018-02-13 Thread Mike Torra
Then could it be that calling `nodetool drain` after calling `nodetool disablegossip` is what causes the problem? On Mon, Feb 12, 2018 at 6:12 PM, kurt greaves wrote: > > ​Actually, it's not really clear to me why disablebinary and thrift are > necessary prior to drain,

Re: node restart causes application latency

2018-02-12 Thread Mike Torra
s that I moved `nodetool disablegossip` to after `nodetool drain`. This is pretty anecdotal, but is there any explanation for why this might happen? I'll be monitoring my cluster closely to see if this change does indeed fix the problem. On Mon, Feb 12, 2018 at 9:33 AM, Mike Torra <mto...@s

Re: node restart causes application latency

2018-02-12 Thread Mike Torra
Any other ideas? If I simply stop the node, there is no latency problem, but once I start the node the problem appears. This happens consistently for all nodes in the cluster On Wed, Feb 7, 2018 at 11:36 AM, Mike Torra <mto...@salesforce.com> wrote: > No, I am not > > On Wed, Fe

Re: node restart causes application latency

2018-02-07 Thread Mike Torra
No, I am not On Wed, Feb 7, 2018 at 11:35 AM, Jeff Jirsa <jji...@gmail.com> wrote: > Are you using internode ssl? > > > -- > Jeff Jirsa > > > On Feb 7, 2018, at 8:24 AM, Mike Torra <mto...@salesforce.com> wrote: > > Thanks for the feedback guys. That e

Re: node restart causes application latency

2018-02-07 Thread Mike Torra
drain do > the right thing), but in this case, your data model looks like the biggest > culprit (unless it's an incomplete recreation). > > - Jeff > > > On Tue, Feb 6, 2018 at 10:58 AM, Mike Torra <mto...@salesforce.com> wrote: > >> Hi - >> >> I

node restart causes application latency

2018-02-06 Thread Mike Torra
Hi - I am running a 29 node cluster spread over 4 DC's in EC2, using C* 3.11.1 on Ubuntu. Occasionally I have the need to restart nodes in the cluster, but every time I do, I see errors and application (nodejs) timeouts. I restart a node like this: nodetool disablethrift && nodetool

sstableloader limitations in multi-dc cluster

2017-06-22 Thread Mike Torra
I'm trying to use sstableloader to bulk load some data to my 4 DC cluster, and I can't quite get it to work. Here is how I'm trying to run it: sstableloader -d 127.0.0.1 -i {csv list of private ips of nodes in cluster} myks/mttest At first this seems to work, with a steady stream of logging

Re: changing compaction strategy

2017-03-14 Thread Mike Torra
to tell when/if the local node has successfully updated the compaction strategy? Looking at the sstable files, it seems like they are still based on STCS but I don't know how to be sure. Appreciate any tips or suggestions! On Mon, Mar 13, 2017 at 5:30 PM, Mike Torra <mto...@salesforce.com>

changing compaction strategy

2017-03-13 Thread Mike Torra
I'm trying to change compaction strategy one node at a time. I'm using jmxterm like this: `echo 'set -b org.apache.cassandra.db:type=ColumnFamilies,keyspace=my_ks,columnfamily=my_cf CompactionParametersJson

Re: lots of connection timeouts around same time every day

2017-02-17 Thread Mike Torra
I can't say that I have tried that while the issue is going on, but I have done such rolling restarts for sure, and the timeouts still occur every day. What would a rolling restart do to fix the issue? In fact, as I write this, I am restarting each node one by one in the eu-west-1 datacenter, and

lots of connection timeouts around same time every day

2017-02-16 Thread Mike Torra
Hi there - Cluster info: C* 3.9, replicated across 4 EC2 regions (us-east-1, us-west-2, eu-west-1, ap-southeast-1), c4.4xlarge Around the same time every day (~7-8am EST), 2 DC's (eu-west-1 and ap-southeast-1) in our cluster start experiencing a high number of timeouts (Connection.TotalTimeouts

Re: implementing a 'sorted set' on top of cassandra

2017-01-17 Thread Mike Torra
m as cache data is volatile and can be evicted on demand. If this is effective also depends on the size of your sets. CS wont be able to sort them by score for you, so you will have to load the complete set to redis for caching and / or do sorting in your app on demand. This certainly won't work

implementing a 'sorted set' on top of cassandra

2017-01-13 Thread Mike Torra
We currently use redis to store sorted sets that we increment many, many times more than we read. For example, only about 5% of these sets are ever read. We are getting to the point where redis is becoming difficult to scale (currently at >20 nodes). We've started using cassandra for other

Re: weird jvm metrics

2017-01-04 Thread Mike Torra
Just bumping - has anyone seen this before? http://stackoverflow.com/questions/41446352/cassandra-3-9-jvm-metrics-have-bad-name From: Mike Torra <mto...@demandware.com<mailto:mto...@demandware.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra

weird jvm metrics

2016-12-28 Thread Mike Torra
Hi There - I recently upgraded from cassandra 3.5 to 3.9 (DDC), and I noticed that the "new" jvm metrics are reporting with an extra '.' character in them. Here is a snippet of what I see from one of my nodes: ubuntu@ip-10-0-2-163:~$ sudo tcpdump -i eth0 -v dst port 2003 -A | grep 'jvm'

Re: failing bootstraps with OOM

2016-11-03 Thread Mike Torra
.apache.org>> Subject: Re: failing bootstraps with OOM On Wed, Nov 2, 2016 at 3:35 PM, Mike Torra <mto...@demandware.com<mailto:mto...@demandware.com>> wrote: > > Hi All - > > I am trying to bootstrap a replacement node in a cluster, but it consistently > fails

failing bootstraps with OOM

2016-11-02 Thread Mike Torra
Hi All - I am trying to bootstrap a replacement node in a cluster, but it consistently fails to bootstrap because of OOM exceptions. For almost a week I've been going through cycles of bootstrapping, finding errors, then restarting / resuming bootstrap, and I am struggling to move forward.