General question on how to configure Cassandra. Say I have 1PByte of
data to store. The general rule of thumb is that each node (or at least
instance of Cassandra) shouldn't handle more than 2TBytes of disk. That
means 500 instances of Cassandra.
Assuming you have very fast persistent storage (such as a NetApp,
PorterWorx etc.), would using Kubernetes or some orchestration layer to
handle those nodes be a viable approach? Perhaps the worker nodes would
have enough RAM to run 4 instances (pods) of Cassandra, you would need
125 servers.
Another approach is to build your servers with 5 (or more) SSD devices -
one for OS, four for each instance of Cassandra running on that server.
Then build some scripts/ansible/puppet that would manage Cassandra
start/stops, and other maintenance items.
Where I think this runs into problems is with repairs, or sstablescrubs
that can take days to run on a single instance. How is that handled 'in
the real world'? With seed nodes, how many would you have in such a
configuration?
Thanks for any thoughts!
-Joe
--
This email has been checked for viruses by AVG antivirus software.
www.avg.com