with the file sizes we're talking about with cassandra and other
database products, the stripe size doesn't seem to matter. i suppose
there may be a modicum of overhead with a small stripe size, but i'm not
sure. mine is set to 128k, which produced the same results as 16k and 256k.
i will say the number of drives within the RAID 0 setup does seem to
matter. more you have the more parallelism you can get with a good RAID
controller.
Eric Rosenberry wrote:
Based on the documentation, it is clear that with Cassandra you want
to have one disk for commitlog, and one disk for data.
My question is: If you think your workload is going to require more io
performance to the data disks than a single disk can handle, how would
you recommend effectively utilizing additional disks?
It would seem a number of vendors sell 1U boxes with four 3.5 inch
disks. If we use one for commitlog, is there a way to have Cassandra
itself equally split data across the three remaining disks? Or is
this something that needs to be handled by the hardware level, or
operating system/file system level?
Options include a hardware RAID controller in a RAID 0 stripe (this is
more $$$ and for what gain?), or utilizing a volume manager like LVM.
Along those same lines, if you do implement some type of striping,
what RAID stripe size is recommended? (I think Todd Burruss asked
this earlier but I did not see a response)
Thanks for any input!
-Eric