On Thu, Feb 25, 2010 at 3:54 PM, Anthony Molinaro <antho...@alumni.caltech.edu> wrote: > What about the case where cpu and ram are underutilized, and your bottleneck > is disk io (which seems to often be the case in ec2), then adding more > spindles improves overall throughput of the system. I've actually tested > this when adding an additional ebs, and hand moving files around, then > restarting. Suddenly node's performance (measued via cfstats metrics), > get better
That sounds like you're actually ram-limited, so adding nodes will be better than adding EBS devices. > How do > the files ever get that big, does a repair fully compact (ie, down > to one file)? I guess the question is how do you end up with the > "worst" case? Any major compaction will do that. Repair will invoke one, or it can happen "naturally" too. > I guess Raid0 is the only way to use multiple disks efficiently and > the multiple DataFileDirectories is really not very useful? I'm > trying to think of a good reason you might want multiple data directories > and all I can't think of one now, is there a good reason? If you are throughput-bound instead of size-bound (which is the case for most uses, especially on non-virtual hardware), then I would expect better performance from JBOD. -Jonathan