Re: Hinted handoff throttled even after "nodetool sethintedhandoffthrottlekb 0"

2017-10-27 Thread Andrew Bialecki
oad or something related to a timeout setting? On Fri, Oct 27, 2017 at 1:49 AM, Andrew Bialecki < andrew.biale...@klaviyo.com> wrote: > We have a 96 node cluster running 3.11 with 256 vnodes each. We're running > a rolling restart. As we restart nodes, we notice that each node t

Hinted handoff throttled even after "nodetool sethintedhandoffthrottlekb 0"

2017-10-26 Thread Andrew Bialecki
throughput? Or other reasons why hinted handoff runs so slowly? -- Andrew Bialecki

Re: cassandra python driver routing requests to one node?

2016-11-14 Thread Andrew Bialecki
at 6:26 PM, Alex Popescu wrote: > I'm wondering if what you are seeing is https://datastax-oss. > atlassian.net/browse/PYTHON-643 (that could still be a sign of a > potential data hotspot) > > On Sun, Nov 13, 2016 at 10:57 PM, Andrew Bialecki < > andrew.biale...@klaviyo.com&g

Re: cassandra python driver routing requests to one node?

2016-11-13 Thread Andrew Bialecki
ight logging set to debug. On Mon, Nov 14, 2016 at 12:39 AM, Ben Slater wrote: > What load balancing policies are you using in your client code ( > https://datastax.github.io/python-driver/api/cassandra/policies.html)? > > Cheers > Ben > > On Mon, 14 Nov 2016 at 16:22 Andr

cassandra python driver routing requests to one node?

2016-11-13 Thread Andrew Bialecki
We have an odd situation where all of a sudden of our cluster started seeing a disproportionate number of writes go to one node. We're using the Python driver version 3.7.1. I'm not sure if this is a driver issue or possibly a network issue causing requests to get routed in an odd way. It's not abs

metrics not resetting after running proxyhistograms or cfhistograms

2016-10-25 Thread Andrew Bialecki
We're running 3.6. Running "nodetool proxyhistograms" twice, we're seeing the same data returned each time, but expecting the second run to be reset. We're seeing the same behavior with "nodetool cfhistograms." I believe resetting after each call used to be the behavior, did that change in recent

High number of ReplicateOnWriteStage All timed blocked, counter CF

2013-10-22 Thread Andrew Bialecki
Hey everyone, We're stress testing writes for a few counter CFs and noticed one one node we got to the point where the ReplicateOnWriteStage thread pool was backed up and it started blocking those tasks. This cluster is six nodes, RF=3, running 1.2.9. All CFs have LCS with 160 MB sstables. All wri

Re: Counters and replication

2013-08-05 Thread Andrew Bialecki
We've seen high CPU in tests on stress tests with counters. With our workload, we had some hot counters (e.g. ones with 100s increments/sec) with RF = 3, which caused the load to spike and replicate on write tasks to back up on those three nodes. Richard already gave a good overview of why this hap

Re: sstable size change

2013-07-22 Thread Andrew Bialecki
My understanding is deleting the .json metadata file is the only way currently. If you search the user list archives, there are folks who are building tools to force compaction and rebuild sstables with the new size. I believe there's been a bit of talk of potentially including those tools as a pat

Re: Deletion use more space.

2013-07-16 Thread Andrew Bialecki
I don't think setting gc_grace_seconds to an hour is going to do what you'd expect. After gc_grace_seconds, if you haven't run a repair within that hour, the data you deleted will seem to have been undeleted. Someone correct me if I'm wrong, but in order to order to completely delete data and rega

Re: node tool ring displays 33.33% owns on 3 node cluster with replication

2013-07-12 Thread Andrew Bialecki
Not sure if it's the best/intended behavior, but you should see it go back to 100% if you run: nodetool -h 127.0.0.1 -p 8080 ring . I think the rationale for showing 33% is that different keyspaces might have different RFs, so it's unclear what to show for ownership. However, if you include the ke

Re: Lots of replicate on write tasks pending, want to investigate

2013-07-03 Thread Andrew Bialecki
> On Wed, Jul 3, 2013 at 9:59 AM, Andrew Bialecki > wrote: > >> 2. I'm assuming in our case the cause is incrementing counters because >> disk reads are part of the write path for counters and are not for >> appending columns to a row. Does that logic make sense? &g

Lots of replicate on write tasks pending, want to investigate

2013-07-03 Thread Andrew Bialecki
In one of our load tests, we're incrementing a single counter column as well as appending columns to a single row (essentially a timeline). You can think of it as counting the instances of an event and then keeping a timeline of those events. The ratio is of increments to "appends" is 1:1. When we

Re: Counter value becomes incorrect after several dozen reads & writes

2013-06-25 Thread Andrew Bialecki
If you can reproduce the invalid behavior 10+% of the time with steps to repro that take 5-10s/iteration, that sounds extremely interesting for getting to the bottom of the invalid shard issue (if that's what the root cause ends up being). Would be very interested in the set up to see if the behavi

Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-21 Thread Andrew Bialecki
We're potentially considering increasing the size of our sstables for some column families from 10MB to something larger. In test, we've been trying to verify that the sstable file sizes change and then doing a bit of benchmarking. However when we run alter the column family and then run "nodetool

Re: Does replicate_on_write=true imply that CL.QUORUM for reads is unnecessary?

2013-06-02 Thread Andrew Bialecki
e on write should normally always be turned on, or the change >> will only be recorded on one node. Replicate on write is asynchronous >> with respect to the request and doesn't affect consistency level at >> all. >> >> >> On Wed, May 29, 2013 at 7:32 P

Re: Does replicate_on_write=true imply that CL.QUORUM for reads is unnecessary?

2013-05-29 Thread Andrew Bialecki
ess of what you actually set it to (and for good reason). On Wed, May 29, 2013 at 9:47 AM, Andrew Bialecki wrote: > Quick question about counter columns. In looking at the replicate_on_write > setting, assuming you go with the default of "true", my understanding is it > writes th

Does replicate_on_write=true imply that CL.QUORUM for reads is unnecessary?

2013-05-29 Thread Andrew Bialecki
Quick question about counter columns. In looking at the replicate_on_write setting, assuming you go with the default of "true", my understanding is it writes the increment to all replicas on any increment. If that's the case, doesn't that mean there's no point in using CL.QUORUM for reads because

Re: Observation on shuffling vs adding/removing nodes

2013-03-24 Thread Andrew Bialecki
- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 24/03/2013, at 9:41 AM, Andrew Bialecki > wrote: > > Just curious if anyone has any thoughts on something we've observed in a

Observation on shuffling vs adding/removing nodes

2013-03-23 Thread Andrew Bialecki
Just curious if anyone has any thoughts on something we've observed in a small test cluster. We had around 100 GB of data on a 3 node cluster (RF=2) and wanted to start using vnodes. We upgraded the cluster to 1.2.2 and then followed the instructions for using vnodes. We initially tried to run a s

Bootstrapping a node in 1.2.2

2013-03-19 Thread Andrew Bialecki
I've got a 3 node cluster in 1.2.2 and just bootstrapped a new node into it. For each of the existing nodes, I had num tokens set to 256 and for the new node I also had it set to 256, however after bootstrapping into the cluster, "nodetool status " for my main keyspace which has RF=2 now reports:

Re: Nodetool drain automatically shutting down node?

2013-03-08 Thread Andrew Bialecki
forceFlush requested but everything is clean in Standard1 INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,510 StorageService.java (line 774) DRAINED On Fri, Mar 8, 2013 at 10:36 PM, Andrew Bialecki wrote: > Hey all, > > We're getting ready to upgrade our cluster to 1.

Re: Running Cassandra 1.1, how can I see the efficiency of the key cache?

2012-12-22 Thread Andrew Bialecki
Thanks, I'll take a look at that too. I also found that "nodetool info" gives some information as well. For instance, here's what one node reads: Key Cache: size 104857584 (bytes), capacity 104857584 (bytes), 15085408 hits, 17336031 requests, 0.870 recent hit rate, 14400 save period in seconds.

Need to run nodetool repair on a cluster running 1.1.6 if no deletes

2012-12-22 Thread Andrew Bialecki
Hey everyone, I'm seeing some conflicting advice out there about whether you need to run nodetool repair within GCGraceSeconds with 1.x. Can someone clarify two things: (1) Do I need to run repair if I'm running 1.x? (2) Should I bother running repair if I don't have any deletes? Anything drawbac

Running Cassandra 1.1, how can I see the efficiency of the key cache?

2012-12-22 Thread Andrew Bialecki
Since it's not in cfstats anymore, is there another way to monitor this? I'm working with a dev cluster and I've got Opscenter set up, so I tried taking a look through that, but it just shows "NO DATA." Does that mean the key cache isn't enabled? I haven't changed the defaults there, so the key ca

Re: Simulating a failed node

2012-10-29 Thread Andrew Bialecki
Thanks, extremely helpful. The key bit was I wasn't flushing the old Keyspace before re-running the stress test, so I was stuck at RF = 1 from a previous run despite passing RF = 2 to the stress tool. On Sun, Oct 28, 2012 at 2:49 AM, Peter Schuller wrote: > > Operation [158320] retried 10 times

Re: Simulating a failed node

2012-10-27 Thread Andrew Bialecki
RF and CL are you using? > > > On 2012/10/28, at 13:13, Andrew Bialecki > wrote: > > Hey everyone, > > I'm trying to simulate what happens when a node goes down to make sure my > cluster can gracefully handle node failures. For my setup I have a 3 node > cluster runni

Simulating a failed node

2012-10-27 Thread Andrew Bialecki
Hey everyone, I'm trying to simulate what happens when a node goes down to make sure my cluster can gracefully handle node failures. For my setup I have a 3 node cluster running 1.1.5. I'm then using the stress tool included in 1.1.5 coming from an external server and running it with the following