Hello.

I'm developing a system that will require me to store large (<=4MB) columns in 
Cassandra. Right now I'm storing 1 column per row, in a single CF. The machines 
I have at my disposal are 32GB RAM machines with 10 SATA drives each. I would 
prefer to have a larger number of smaller nodes, but this is what I have to 
work with. Some issues that I have are: RAID0 Vs separate data dirs, and 
SizeTiered compaction Vs Leveled compaction. I will have approximately 2 times 
more writes than reads.

RAID0 would help me use more efficiently the total disk space available at each 
node, but tests have shown that under write load it behaves much worse than 
using separate data dirs, one per disk. I used a 3-node cluster, and the node 
with RAID0 kept getting behind the other two nodes which had separate data 
dirs. The problem with separate data dirs is that it seems to be difficult for 
Cassandra to use the space efficiently due to the compactions. I first tried 
the new Leveled compactions scheme, which seemed promising since it would 
create "small" files that could be scattered by the data dirs, but the IO 
necessary for this compaction scheme is enormous under write load. It was 
constantly working and it affected the write throughput because it slowed the 
flushing of memtables. I then tried tiered compaction and it performed better, 
but as it tends to create large SSTables they cannot be split across the 
multiple data dirs.

What I'm thinking of doing now is using multiple data dirs, with tiered 
compaction, and dividing the input data in several (64) different CFs. This way 
smaller SSTables will be created and these can be split across the multiple 
data dirs. This will allow me to better use the available capacity and I will 
not need as much free space for compactions than I would if the SSTables were 
larger.

Am I missing something here? Is this the best way to deal with this (abnormal) 
use case?

Thanks and best regards,
André Cruz

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to