The optimal node size largely depends on the table schema and read/write
pattern. In some cases 500 GB per node is too large, but in some other
cases 10TB per node works totally fine. It's hard to estimate that
without benchmarking.
Again, just pointing out the obvious, you did not count the off-heap
memory and page cache. 1TB of RAM for 24GB heap * 40 instances is
definitely not enough. You'll most likely need between 1.5 and 2 TB
memory for 40x 24GB heap nodes. You may be better off with blade servers
than single server with gigantic memory and disk sizes.
On 17/08/2023 15:46, Joe Obernberger wrote:
Thanks for this - yeah - duh - forgot about replication in my example!
So - is 2TBytes per Cassandra instance advisable? Better to use
more/less? Modern 2u servers can be had with 24 3.8TBtyte SSDs; so
assume 80Tbytes per server, you could do:
(1024*3)/80 = 39 servers, but you'd have to run 40 instances of
Cassandra on each server; maybe 24G of heap per instance, so a server
with 1TByte of RAM would work.
Is this what folks would do?
-Joe
On 8/17/2023 9:13 AM, Bowen Song via user wrote:
Just pointing out the obvious, for 1PB of data on nodes with 2TB disk
each, you will need far more than 500 nodes.
1, it is unwise to run Cassandra with replication factor 1. It
usually makes sense to use RF=3, so 1PB data will cost 3PB of storage
space, minimal of 1500 such nodes.
2, depending on the compaction strategy you use and the write access
pattern, there's a disk space amplification to consider. For example,
with STCS, the disk usage can be many times of the actual live data
size.
3, you will need some extra free disk space as temporary space for
running compactions.
4, the data is rarely going to be perfectly evenly distributed among
all nodes, and you need to take that into consideration and size the
nodes based on the node with the most data.
5, enough of bad news, here's a good one. Compression will save you
(a lot) of disk space!
With all the above considered, you probably will end up with a lot
more than the 500 nodes you initially thought. Your choice of
compaction strategy and compression ratio can dramatically affect
this calculation.
On 16/08/2023 16:33, Joe Obernberger wrote:
General question on how to configure Cassandra. Say I have 1PByte
of data to store. The general rule of thumb is that each node (or
at least instance of Cassandra) shouldn't handle more than 2TBytes
of disk. That means 500 instances of Cassandra.
Assuming you have very fast persistent storage (such as a NetApp,
PorterWorx etc.), would using Kubernetes or some orchestration layer
to handle those nodes be a viable approach? Perhaps the worker nodes
would have enough RAM to run 4 instances (pods) of Cassandra, you
would need 125 servers.
Another approach is to build your servers with 5 (or more) SSD
devices - one for OS, four for each instance of Cassandra running on
that server. Then build some scripts/ansible/puppet that would
manage Cassandra start/stops, and other maintenance items.
Where I think this runs into problems is with repairs, or
sstablescrubs that can take days to run on a single instance. How is
that handled 'in the real world'? With seed nodes, how many would
you have in such a configuration?
Thanks for any thoughts!
-Joe