Re: Advice on architecture

2012-03-28 Thread Radim Kolar



RAID0 would help me use more efficiently the total disk space available at each 
node, but tests have shown that under write load it behaves much worse than 
using separate data dirs, one per disk.
there are different strategies how RAID0 splits reads, also changing io 
scheduler and filesystem helps. I found that ZFS/ZRAID is best, 
especially backups are very good. If you dont plan to do backups ext4 is 
not bad either, but compactions are rather slow on it.

  I used a 3-node cluster, and the node with RAID0 kept getting behind the 
other two nodes which had separate data dirs. The problem with separate data 
dirs is that it seems to be difficult for Cassandra to use the space 
efficiently due to the compactions.
If you need to think about disk free space on nodes, then you do not 
have enough storage. TB drives are cheap today, buy some. Cluster should 
not be designed - we will be lucky if all our data fits there and we 
will not run out of space during major compactions.

  I first tried the new Leveled compactions scheme, which seemed promising since it would 
create small files that could be scattered by the data dirs, but the IO 
necessary for this compaction scheme is enormous under write load.
yes. its for mostly read only apps. but raising base table size to 
something larger like 50 MB helps.

Am I missing something here? Is this the best way to deal with this (abnormal) 
use case?
It takes time to learn how to tune cassandra properly. If you do not 
have time, hire somebody who will do it for you. It took me few months 
to master and its kinda difficult to explain it over mail.


Re: Advice on architecture

2012-03-28 Thread Igor

On 03/28/2012 02:04 PM, Radim Kolar wrote:


RAID0 would help me use more efficiently the total disk space 
available at each node, but tests have shown that under write load it 
behaves much worse than using separate data dirs, one per disk.
there are different strategies how RAID0 splits reads, also changing 
io scheduler and filesystem helps. I found that ZFS/ZRAID is best, 
especially backups are very good. If you dont plan to do backups ext4 
is not bad either, but compactions are rather slow on it.


I'm also trying to evaluate different strategies for RAID0 as drive for 
cassandra data storage. If I need 2T space to keep node tables, which 
drive configuration is better: 1T x 2drives or 500G x 4drives? Which 
stripe size is optimal? Should I use hardware raid or linux raid is ok? 
I mostly concerned with read performance.





Re: Advice on architecture

2012-03-28 Thread Mateusz Korniak
On Wednesday 28 of March 2012, Igor wrote:
 I'm also trying to evaluate different strategies for RAID0 as drive for
 cassandra data storage. If I need 2T space to keep node tables, which
 drive configuration is better: 1T x 2drives or 500G x 4drives? 

Having _similar_ family of HDDs 4x smaller should be twice faster in reads 
than 2x bigger.


 Should I use hardware raid or linux raid is ok?
Instead of buying hardware raids buy more disks/nodes - should give more 
performance gain.

 I mostly concerned with read performance.


-- 
Mateusz Korniak
(...) mam brata - poważny, domator, liczykrupa, hipokryta, pobożniś,
krótko mówiąc - podpora społeczeństwa.
Nikos Kazantzakis - Grek Zorba

-- 
Mateusz Korniak