Re: Big Data Question

Joe Obernberger Thu, 17 Aug 2023 13:53:26 -0700

Was assuming reaper did incremental?  That was probably a bad assumption.


nodetool repair -pr
I know it well now!

:)

-Joe

On 8/17/2023 4:47 PM, Bowen Song via user wrote:

I don't have experience with Cassandra on Kubernetes, so I can'tcomment on that.
For repairs, may I interest you with incremental repairs? It will makerepairs hell of a lot faster. Of course, occasional full repair isstill needed, but that's another story.
On 17/08/2023 21:36, Joe Obernberger wrote:
Thank you.  Enjoying this conversation.
Agree on blade servers, where each blade has a small number of SSDs. Yeh/Nah to a kubernetes approach assuming fast persistent storage? Ithink that might be easier to manage.
In my current benchmarks, the performance is excellent, but therepairs are painful. I come from the Hadoop world where it was allabout large servers with lots of disk.Relatively small number of tables, but some have a high number ofrows, 10bil + - we use spark to run across all the data.
-Joe

On 8/17/2023 12:13 PM, Bowen Song via user wrote:
The optimal node size largely depends on the table schema andread/write pattern. In some cases 500 GB per node is too large, butin some other cases 10TB per node works totally fine. It's hard toestimate that without benchmarking.
Again, just pointing out the obvious, you did not count the off-heapmemory and page cache. 1TB of RAM for 24GB heap * 40 instances isdefinitely not enough. You'll most likely need between 1.5 and 2 TBmemory for 40x 24GB heap nodes. You may be better off with bladeservers than single server with gigantic memory and disk sizes.
On 17/08/2023 15:46, Joe Obernberger wrote:
Thanks for this - yeah - duh - forgot about replication in my example!
So - is 2TBytes per Cassandra instance advisable? Better to usemore/less? Modern 2u servers can be had with 24 3.8TBtyte SSDs; soassume 80Tbytes per server, you could do:(1024*3)/80 = 39 servers, but you'd have to run 40 instances ofCassandra on each server; maybe 24G of heap per instance, so aserver with 1TByte of RAM would work.
Is this what folks would do?

-Joe

On 8/17/2023 9:13 AM, Bowen Song via user wrote:
Just pointing out the obvious, for 1PB of data on nodes with 2TBdisk each, you will need far more than 500 nodes.
1, it is unwise to run Cassandra with replication factor 1. Itusually makes sense to use RF=3, so 1PB data will cost 3PB ofstorage space, minimal of 1500 such nodes.
2, depending on the compaction strategy you use and the writeaccess pattern, there's a disk space amplification to consider.For example, with STCS, the disk usage can be many times of theactual live data size.
3, you will need some extra free disk space as temporary space forrunning compactions.
4, the data is rarely going to be perfectly evenly distributedamong all nodes, and you need to take that into consideration andsize the nodes based on the node with the most data.
5, enough of bad news, here's a good one. Compression will saveyou (a lot) of disk space!
With all the above considered, you probably will end up with a lotmore than the 500 nodes you initially thought. Your choice ofcompaction strategy and compression ratio can dramatically affectthis calculation.
On 16/08/2023 16:33, Joe Obernberger wrote:
General question on how to configure Cassandra. Say I have1PByte of data to store. The general rule of thumb is that eachnode (or at least instance of Cassandra) shouldn't handle morethan 2TBytes of disk. That means 500 instances of Cassandra.
Assuming you have very fast persistent storage (such as a NetApp,PorterWorx etc.), would using Kubernetes or some orchestrationlayer to handle those nodes be a viable approach? Perhaps theworker nodes would have enough RAM to run 4 instances (pods) ofCassandra, you would need 125 servers.Another approach is to build your servers with 5 (or more) SSDdevices - one for OS, four for each instance of Cassandra runningon that server. Then build some scripts/ansible/puppet thatwould manage Cassandra start/stops, and other maintenance items.
Where I think this runs into problems is with repairs, orsstablescrubs that can take days to run on a single instance. Howis that handled 'in the real world'? With seed nodes, how manywould you have in such a configuration?
Thanks for any thoughts!

-Joe


--
This email has been checked for viruses by AVG antivirus software.
www.avg.com

Re: Big Data Question

Reply via email to