Big Data Question

Joe Obernberger Wed, 16 Aug 2023 08:34:01 -0700

General question on how to configure Cassandra. Say I have 1PByte ofdata to store. The general rule of thumb is that each node (or at leastinstance of Cassandra) shouldn't handle more than 2TBytes of disk. Thatmeans 500 instances of Cassandra.

Assuming you have very fast persistent storage (such as a NetApp,PorterWorx etc.), would using Kubernetes or some orchestration layer tohandle those nodes be a viable approach? Perhaps the worker nodes wouldhave enough RAM to run 4 instances (pods) of Cassandra, you would need125 servers.Another approach is to build your servers with 5 (or more) SSD devices -one for OS, four for each instance of Cassandra running on that server. Then build some scripts/ansible/puppet that would manage Cassandrastart/stops, and other maintenance items.

Where I think this runs into problems is with repairs, or sstablescrubsthat can take days to run on a single instance. How is that handled 'inthe real world'? With seed nodes, how many would you have in such aconfiguration?

Thanks for any thoughts!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Big Data Question

Reply via email to