A few thoughts on this:– 80TB per machine is pretty dense. Consider the amount of data 
you'd need to re-replicate in the event of a hardware failure that takes down all 80TB 
(DIMM failure requiring replacement, non-reduntant PSU failure, NIC, etc).– 24GB of heap 
is also pretty generous. Depending on how you're using Cassandra, you may be able to get 
by with ~half of this (though keep in mind the additional direct memory/offheap space 
required if you're using off-heap merkle trees).– 40 instances per machine can be a lot 
to manage. You can reduce this and address multiple physical drives per instance by 
either RAID-0'ing them together; or by using Cassandra in a JBOD configuration (multiple 
data dirs per instance).– Remember to consider the ratio of available CPU vs. amount of 
storage you're addressing per machine in your configuration. It's easy to spec a box 
that maxes out on disk without enough oomph to serve user queries and compaction over 
the amount of storage.– You'll want to run some smaller-scale perf testing to determine 
this ratio. The good news is that you mostly need to stress is the throughput of a 
replica set rather than an entire cluster. Small-scale POCs will generally map well to 
larger clusters, so long as the total count of Cassandra processes isn't more than a 
couple thousand.– At this scale, small improvements can go a very long way. If your data 
is compressible (i.e., not pre-compressed/encrypted prior to being stored in Cassandra), 
you'll likely want to use ZStandard rather than LZ4 - and possibly at a higher-ratio 
than the default. Test a set of input data with different ZStandard compression levels. 
You may save > 10% of storage relative to LZ4 by doing so without sacrificing much in 
terms of CPU.On Aug 17, 2023, at 7:46 AM, Joe Obernberger 
<joseph.obernber...@gmail.com> wrote:Thanks for this - yeah - duh - forgot about 
replication in my example!So - is 2TBytes per Cassandra instance advisable?  Better to 
use more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so assume 80Tbytes 
per server, you could do:(1024*3)/80 = 39 servers, but you'd have to run 40 instances of 
Cassandra on each server; maybe 24G of heap per instance, so a server with 1TByte of RAM 
would work.Is this what folks would do?-JoeOn 8/17/2023 9:13 AM, Bowen Song via user 
wrote:Just pointing out the obvious, for 1PB of data on nodes with 2TB disk each, you 
will need far more than 500 nodes.1, it is unwise to run Cassandra with replication 
factor 1. It usually makes sense to use RF=3, so 1PB data will cost 3PB of storage 
space, minimal of 1500 such nodes.2, depending on the compaction strategy you use and 
the write access pattern, there's a disk space amplification to consider. For example, 
with STCS, the disk usage can be many times of the actual live data size.3, you will 
need some extra free disk space as temporary space for running compactions.4, the data 
is rarely going to be perfectly evenly distributed among all nodes, and you need to take 
that into consideration and size the nodes based on the node with the most data.5, 
enough of bad news, here's a good one. Compression will save you (a lot) of disk 
space!With all the above considered, you probably will end up with a lot more than the 
500 nodes you initially thought. Your choice of compaction strategy and compression 
ratio can dramatically affect this calculation.On 16/08/2023 16:33, Joe Obernberger 
wrote:General question on how to configure Cassandra.  Say I have 1PByte of data to 
store.  The general rule of thumb is that each node (or at least instance of Cassandra) 
shouldn't handle more than 2TBytes of disk.  That means 500 instances of 
Cassandra.Assuming you have very fast persistent storage (such as a NetApp, PorterWorx 
etc.), would using Kubernetes or some orchestration layer to handle those nodes be a 
viable approach? Perhaps the worker nodes would have enough RAM to run 4 instances 
(pods) of Cassandra, you would need 125 servers.Another approach is to build your 
servers with 5 (or more) SSD devices - one for OS, four for each instance of Cassandra 
running on that server.  Then build some scripts/ansible/puppet that would manage 
Cassandra start/stops, and other maintenance items.Where I think this runs into problems 
is with repairs, or sstablescrubs that can take days to run on a single instance. How is 
that handled 'in the real world'?  With seed nodes, how many would you have in such a 
configuration?Thanks for any thoughts!-Joe-- This email has been checked for viruses by 
AVG antivirus software.www.avg.com

Reply via email to