oad or something related to a timeout
setting?
On Fri, Oct 27, 2017 at 1:49 AM, Andrew Bialecki <
andrew.biale...@klaviyo.com> wrote:
> We have a 96 node cluster running 3.11 with 256 vnodes each. We're running
> a rolling restart. As we restart nodes, we notice that each node t
throughput?
Or other reasons why hinted handoff runs so slowly?
--
Andrew Bialecki
at 6:26 PM, Alex Popescu wrote:
> I'm wondering if what you are seeing is https://datastax-oss.
> atlassian.net/browse/PYTHON-643 (that could still be a sign of a
> potential data hotspot)
>
> On Sun, Nov 13, 2016 at 10:57 PM, Andrew Bialecki <
> andrew.biale...@klaviyo.com&g
ight logging set to debug.
On Mon, Nov 14, 2016 at 12:39 AM, Ben Slater
wrote:
> What load balancing policies are you using in your client code (
> https://datastax.github.io/python-driver/api/cassandra/policies.html)?
>
> Cheers
> Ben
>
> On Mon, 14 Nov 2016 at 16:22 Andr
We have an odd situation where all of a sudden of our cluster started
seeing a disproportionate number of writes go to one node. We're using the
Python driver version 3.7.1. I'm not sure if this is a driver issue or
possibly a network issue causing requests to get routed in an odd way. It's
not abs
We're running 3.6. Running "nodetool proxyhistograms" twice, we're seeing
the same data returned each time, but expecting the second run to be reset.
We're seeing the same behavior with "nodetool cfhistograms."
I believe resetting after each call used to be the behavior, did that
change in recent
Hey everyone,
We're stress testing writes for a few counter CFs and noticed one one node
we got to the point where the ReplicateOnWriteStage thread pool was backed
up and it started blocking those tasks. This cluster is six nodes, RF=3,
running 1.2.9. All CFs have LCS with 160 MB sstables. All wri
We've seen high CPU in tests on stress tests with counters. With our
workload, we had some hot counters (e.g. ones with 100s increments/sec)
with RF = 3, which caused the load to spike and replicate on write tasks to
back up on those three nodes. Richard already gave a good overview of why
this hap
My understanding is deleting the .json metadata file is the only way
currently. If you search the user list archives, there are folks who are
building tools to force compaction and rebuild sstables with the new size.
I believe there's been a bit of talk of potentially including those tools
as a pat
I don't think setting gc_grace_seconds to an hour is going to do what you'd
expect. After gc_grace_seconds, if you haven't run a repair within that
hour, the data you deleted will seem to have been undeleted.
Someone correct me if I'm wrong, but in order to order to completely delete
data and rega
Not sure if it's the best/intended behavior, but you should see it go back
to 100% if you run: nodetool -h 127.0.0.1 -p 8080 ring .
I think the rationale for showing 33% is that different keyspaces might
have different RFs, so it's unclear what to show for ownership. However, if
you include the ke
> On Wed, Jul 3, 2013 at 9:59 AM, Andrew Bialecki > wrote:
>
>> 2. I'm assuming in our case the cause is incrementing counters because
>> disk reads are part of the write path for counters and are not for
>> appending columns to a row. Does that logic make sense?
&g
In one of our load tests, we're incrementing a single counter column as
well as appending columns to a single row (essentially a timeline). You can
think of it as counting the instances of an event and then keeping a
timeline of those events. The ratio is of increments to "appends" is 1:1.
When we
If you can reproduce the invalid behavior 10+% of the time with steps to
repro that take 5-10s/iteration, that sounds extremely interesting for
getting to the bottom of the invalid shard issue (if that's what the root
cause ends up being). Would be very interested in the set up to see if the
behavi
We're potentially considering increasing the size of our sstables for some
column families from 10MB to something larger.
In test, we've been trying to verify that the sstable file sizes change and
then doing a bit of benchmarking. However when we run alter the column
family and then run "nodetool
e on write should normally always be turned on, or the change
>> will only be recorded on one node. Replicate on write is asynchronous
>> with respect to the request and doesn't affect consistency level at
>> all.
>>
>>
>> On Wed, May 29, 2013 at 7:32 P
ess of
what you actually set it to (and for good reason).
On Wed, May 29, 2013 at 9:47 AM, Andrew Bialecki
wrote:
> Quick question about counter columns. In looking at the replicate_on_write
> setting, assuming you go with the default of "true", my understanding is it
> writes th
Quick question about counter columns. In looking at the replicate_on_write
setting, assuming you go with the default of "true", my understanding is it
writes the increment to all replicas on any increment.
If that's the case, doesn't that mean there's no point in using CL.QUORUM
for reads because
-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 24/03/2013, at 9:41 AM, Andrew Bialecki
> wrote:
>
> Just curious if anyone has any thoughts on something we've observed in a
Just curious if anyone has any thoughts on something we've observed in a
small test cluster.
We had around 100 GB of data on a 3 node cluster (RF=2) and wanted to start
using vnodes. We upgraded the cluster to 1.2.2 and then followed the
instructions for using vnodes. We initially tried to run a s
I've got a 3 node cluster in 1.2.2 and just bootstrapped a new node into
it. For each of the existing nodes, I had num tokens set to 256 and for the
new node I also had it set to 256, however after bootstrapping into the
cluster, "nodetool status " for my main keyspace which has RF=2
now reports:
forceFlush requested but everything is
clean in Standard1
INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,510
StorageService.java (line 774) DRAINED
On Fri, Mar 8, 2013 at 10:36 PM, Andrew Bialecki
wrote:
> Hey all,
>
> We're getting ready to upgrade our cluster to 1.
Thanks, I'll take a look at that too.
I also found that "nodetool info" gives some information as well. For
instance, here's what one node reads:
Key Cache: size 104857584 (bytes), capacity 104857584 (bytes), 15085408
hits, 17336031 requests, 0.870 recent hit rate, 14400 save period in
seconds.
Hey everyone,
I'm seeing some conflicting advice out there about whether you need to run
nodetool repair within GCGraceSeconds with 1.x. Can someone clarify two
things:
(1) Do I need to run repair if I'm running 1.x?
(2) Should I bother running repair if I don't have any deletes? Anything
drawbac
Since it's not in cfstats anymore, is there another way to monitor this?
I'm working with a dev cluster and I've got Opscenter set up, so I tried
taking a look through that, but it just shows "NO DATA." Does that mean the
key cache isn't enabled? I haven't changed the defaults there, so the key
ca
Thanks, extremely helpful. The key bit was I wasn't flushing the old
Keyspace before re-running the stress test, so I was stuck at RF = 1 from a
previous run despite passing RF = 2 to the stress tool.
On Sun, Oct 28, 2012 at 2:49 AM, Peter Schuller wrote:
> > Operation [158320] retried 10 times
RF and CL are you using?
>
>
> On 2012/10/28, at 13:13, Andrew Bialecki
> wrote:
>
> Hey everyone,
>
> I'm trying to simulate what happens when a node goes down to make sure my
> cluster can gracefully handle node failures. For my setup I have a 3 node
> cluster runni
Hey everyone,
I'm trying to simulate what happens when a node goes down to make sure my
cluster can gracefully handle node failures. For my setup I have a 3 node
cluster running 1.1.5. I'm then using the stress tool included in 1.1.5
coming from an external server and running it with the following
28 matches
Mail list logo