RE: Big Data Question

2023-08-17 Thread Durity, Sean R via user
For a variety of reasons, we have clusters with 5 TB of disk per host as a “standard.” In our larger data clusters, it does take longer to add/remove nodes or do things like upgradesstables after an upgrade. These nodes have 3+TB of actual data on the drive. But, we were able to shrink the node

RE: Big Data Question

2023-08-18 Thread Durity, Sean R via user
Cost of availability is a fair question at some level of the discussion. In my experience, high availability is one of the top 2 or 3 reasons why Cassandra is chosen as the data solution. So, if I am given a Cassandra use case to build out, I would normally assume high availability is needed,

RE: Adding nodes

2022-07-12 Thread Durity, Sean R via user
In my experience C* is not cheaper storage than HDFS. If that is the goal, it may be painful. Each Cassandra DC has at least one full copy of the data set. For production data that I care about (that my app teams care about), we use RF=3 in each Cassandra DC. And I only use 1 Cassandra rack

RE: Questions on the count and multiple index behaviour in cassandra

2022-09-29 Thread Durity, Sean R via user
Aggregate queries (like count(*) ) are fine *within* a reasonably sized partition (under 100 MB in size). However, Cassandra is not the right tool if you want to do aggregate queries *across* partitions (unless you break up the work with something like Spark). Choosing the right partition key

RE: Best compaction strategy for rarely used data

2022-12-30 Thread Durity, Sean R via user
: [EXTERNAL] Re: Best compaction strategy for rarely used data On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will end up with large sstables (like 1 TB) that won’t > compact because there are not 4 similar-sized ones able to be compacted Yes, that's exactly

RE: Best compaction strategy for rarely used data

2022-12-29 Thread Durity, Sean R via user
If there isn’t a TTL and timestamp on the data, I’m not sure the benefits of TWCS for this use case. I would stick with size-tiered. At some point you will end up with large sstables (like 1 TB) that won’t compact because there are not 4 similar-sized ones able to be compacted (assuming default

RE: Cassandra 4.0.7 - issue - service not starting

2022-12-08 Thread Durity, Sean R via user
I have seen this when there is a tab character in the yaml file. Yaml is (too) picky on these things. Sean R. Durity DB Solutions Staff Systems Engineer – Cassandra From: Amit Patel via user Sent: Thursday, December 8, 2022 11:38 AM To: Arvydas Jonusonis ; user@cassandra.apache.org Subject:

RE: Cassandra Summit CFP update

2022-11-30 Thread Durity, Sean R via user
Does it need to be strictly Apache Cassandra? Or is something built on/working with DataStax Enterprise allowed? I would think if it doesn’t depend on DSE-only technology, it could still apply to a general Cassandra audience. Sean R. Durity From: Patrick McFadin Sent: Tuesday, November 29,

RE: Failed disks - correct procedure

2023-01-17 Thread Durity, Sean R via user
For physical hardware when disks fail, I do a removenode, wait for the drive to be replaced, reinstall Cassandra, and then bootstrap the node back in (and run clean-up across the DC). All of our disks are presented as one file system for data, which is not what the original question was

RE: Query drivertimeout PT2S

2022-11-09 Thread Durity, Sean R via user
>From the subject, this looks like a client-side timeout (thrown by the >driver). I have seen situations where the client/driver timeout of 2 seconds >is a shorter timeout than on the server side (10 seconds). So, the server >doesn’t really note any problem. Unless this is a very remote client

RE: Cleanup

2023-02-17 Thread Durity, Sean R via user
, Feb 16, 2023 at 9:43 AM Marc Hoppins mailto:marc.hopp...@eset.com>> wrote: Compaction_throughtput_per_mb is 0 in cassandra.yaml. Is setting it in nodetool going to provide any increase? From: Durity, Sean R via user mailto:user@cassandra.apache.org>> Sent: Thursday, February 16, 2023

RE: Cleanup

2023-02-16 Thread Durity, Sean R via user
Clean-up is constrained/throttled by compactionthroughput. If your system can handle it, you can increase that throughput (nodetool setcompactionthroughput) for the clean-up in order to reduce the total time. It is a node-isolated operation, not cluster-involved. I often run clean up on all

RE: Startup fails - 4.1.0

2023-02-03 Thread Durity, Sean R via user
In most cases, I would delete the corrupt commit log file and restart. Then run repairs on that node. I have seen cases where multiple files are corrupted and it is easier to remove all commit log files to get the node restarted. Sean R. Durity From: Joe Obernberger Sent: Friday, February 3,

RE: Survey about the parsing of the tooling's output

2023-07-10 Thread Durity, Sean R via user
We also parse the output from nodetool info and nodetool status and (to a lesser degree) nodetool netstats. We have basically made info and status more operator-friendly in a multi-cluster environment. (And we added a useable return value to our info command that we can use to evaluate the

RE: Cassandra p95 latencies

2023-08-11 Thread Durity, Sean R via user
I would expect single digit ms latency on reads and writes. However, we have not done any performance testing on Apache Cassandra 4.x. Sean R. Durity INTERNAL USE From: Shaurya Gupta Sent: Friday, August 11, 2023 1:16 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cassandra p95

RE: Is cleanup is required if cluster topology changes

2023-05-05 Thread Durity, Sean R via user
I run clean-up in parallel, not serially, since it is a node-only kind of operation. And I only run in the impacted DC. With only 300 GB on a node, clean-up should not take very long. Check your compactionthroughput. I ran clean-up in parallel on 53 nodes with over 3 TB of data each. It took

RE: Check out new features in K8ssandra and Mission Control

2024-02-28 Thread Durity, Sean R via user
The k8ssandra requirement is a major blocker. Sean R. Durity INTERNAL USE From: Christopher Bradford Sent: Tuesday, February 27, 2024 9:49 PM To: user@cassandra.apache.org Cc: Christopher Bradford Subject: [EXTERNAL] Re: Check out new features in K8ssandra and Mission Control Hey Jon, *