For a variety of reasons, we have clusters with 5 TB of disk per host as a
“standard.” In our larger data clusters, it does take longer to add/remove
nodes or do things like upgradesstables after an upgrade. These nodes have 3+TB
of actual data on the drive. But, we were able to shrink the node
Cost of availability is a fair question at some level of the discussion. In my
experience, high availability is one of the top 2 or 3 reasons why Cassandra is
chosen as the data solution. So, if I am given a Cassandra use case to build
out, I would normally assume high availability is needed,
In my experience C* is not cheaper storage than HDFS. If that is the goal, it
may be painful.
Each Cassandra DC has at least one full copy of the data set. For production
data that I care about (that my app teams care about), we use RF=3 in each
Cassandra DC. And I only use 1 Cassandra rack
Aggregate queries (like count(*) ) are fine *within* a reasonably sized
partition (under 100 MB in size). However, Cassandra is not the right tool if
you want to do aggregate queries *across* partitions (unless you break up the
work with something like Spark). Choosing the right partition key
: [EXTERNAL] Re: Best compaction strategy for rarely used data
On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will
end up with large sstables (like 1 TB) that won’t > compact because there are
not 4 similar-sized ones able to be compacted Yes, that's exactly
If there isn’t a TTL and timestamp on the data, I’m not sure the benefits of
TWCS for this use case. I would stick with size-tiered. At some point you will
end up with large sstables (like 1 TB) that won’t compact because there are not
4 similar-sized ones able to be compacted (assuming default
I have seen this when there is a tab character in the yaml file. Yaml is (too)
picky on these things.
Sean R. Durity
DB Solutions
Staff Systems Engineer – Cassandra
From: Amit Patel via user
Sent: Thursday, December 8, 2022 11:38 AM
To: Arvydas Jonusonis ; user@cassandra.apache.org
Subject:
Does it need to be strictly Apache Cassandra? Or is something built on/working
with DataStax Enterprise allowed? I would think if it doesn’t depend on
DSE-only technology, it could still apply to a general Cassandra audience.
Sean R. Durity
From: Patrick McFadin
Sent: Tuesday, November 29,
For physical hardware when disks fail, I do a removenode, wait for the drive to
be replaced, reinstall Cassandra, and then bootstrap the node back in (and run
clean-up across the DC).
All of our disks are presented as one file system for data, which is not what
the original question was
>From the subject, this looks like a client-side timeout (thrown by the
>driver). I have seen situations where the client/driver timeout of 2 seconds
>is a shorter timeout than on the server side (10 seconds). So, the server
>doesn’t really note any problem. Unless this is a very remote client
, Feb 16, 2023 at 9:43 AM Marc Hoppins
mailto:marc.hopp...@eset.com>> wrote:
Compaction_throughtput_per_mb is 0 in cassandra.yaml. Is setting it in nodetool
going to provide any increase?
From: Durity, Sean R via user
mailto:user@cassandra.apache.org>>
Sent: Thursday, February 16, 2023
Clean-up is constrained/throttled by compactionthroughput. If your system can
handle it, you can increase that throughput (nodetool setcompactionthroughput)
for the clean-up in order to reduce the total time.
It is a node-isolated operation, not cluster-involved. I often run clean up on
all
In most cases, I would delete the corrupt commit log file and restart. Then run
repairs on that node. I have seen cases where multiple files are corrupted and
it is easier to remove all commit log files to get the node restarted.
Sean R. Durity
From: Joe Obernberger
Sent: Friday, February 3,
We also parse the output from nodetool info and nodetool status and (to a
lesser degree) nodetool netstats. We have basically made info and status more
operator-friendly in a multi-cluster environment. (And we added a useable
return value to our info command that we can use to evaluate the
I would expect single digit ms latency on reads and writes. However, we have
not done any performance testing on Apache Cassandra 4.x.
Sean R. Durity
INTERNAL USE
From: Shaurya Gupta
Sent: Friday, August 11, 2023 1:16 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra p95
I run clean-up in parallel, not serially, since it is a node-only kind of
operation. And I only run in the impacted DC. With only 300 GB on a node,
clean-up should not take very long. Check your compactionthroughput.
I ran clean-up in parallel on 53 nodes with over 3 TB of data each. It took
The k8ssandra requirement is a major blocker.
Sean R. Durity
INTERNAL USE
From: Christopher Bradford
Sent: Tuesday, February 27, 2024 9:49 PM
To: user@cassandra.apache.org
Cc: Christopher Bradford
Subject: [EXTERNAL] Re: Check out new features in K8ssandra and Mission Control
Hey Jon, *
17 matches
Mail list logo