General question on how to configure Cassandra.  Say I have 1PByte of data to store.  The general rule of thumb is that each node (or at least instance of Cassandra) shouldn't handle more than 2TBytes of disk.  That means 500 instances of Cassandra.

Assuming you have very fast persistent storage (such as a NetApp, PorterWorx etc.), would using Kubernetes or some orchestration layer to handle those nodes be a viable approach?  Perhaps the worker nodes would have enough RAM to run 4 instances (pods) of Cassandra, you would need 125 servers. Another approach is to build your servers with 5 (or more) SSD devices - one for OS, four for each instance of Cassandra running on that server.  Then build some scripts/ansible/puppet that would manage Cassandra start/stops, and other maintenance items.

Where I think this runs into problems is with repairs, or sstablescrubs that can take days to run on a single instance.  How is that handled 'in the real world'?  With seed nodes, how many would you have in such a configuration?
Thanks for any thoughts!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Reply via email to