One more thoughts about Martin's suggestion: is it possible to put the data files into multiple directories that are located in different physical disks? This should help to improve the i/o bottleneck issue.
Has anybody tested the row-caching feature in trunk (shoot for 0.6?)? -Weijun On Tue, Feb 16, 2010 at 9:50 AM, Weijun Li <weiju...@gmail.com> wrote: > Dumped 50mil records into my 2-node cluster overnight, made sure that > there's not many data files (around 30 only) per Martin's suggestion. The > size of the data directory is 63GB. Now when I read records from the cluster > the read latency is still ~44ms, --there's no write happening during the > read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is > saturated: > > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz > avgqu-sz await svctm %util > sda 47.67 67.67 190.33 17.00 23933.33 677.33 118.70 > 5.24 25.25 4.64 96.17 > sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sda2 47.67 67.67 190.33 17.00 23933.33 677.33 118.70 > 5.24 25.25 4.64 96.17 > sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > > CPU usage is low. > > Does this mean disk i/o is the bottleneck for my case? Will it help if I > increase KCF to cache all sstable index? > > Also, this is the almost a read-only mode test, and in reality, our > write/read ratio is close to 1:1 so I'm guessing read latency will even go > higher in that case because there will be difficult for cassandra to find a > good moment to compact the data files that are being busy written. > > Thanks, > -Weijun > > > > On Tue, Feb 16, 2010 at 6:06 AM, Brandon Williams <dri...@gmail.com>wrote: > >> On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller < >> martin.grabmuel...@eleven.de> wrote: >> >>> In my tests I have observed that good read latency depends on keeping >>> the number of data files low. In my current test setup, I have stored >>> 1.9 TB of data on a single node, which is in 21 data files, and read >>> latency is between 10 and 60ms (for small reads, larger read of course >>> take more time). In earlier stages of my test, I had up to 5000 >>> data files, and read performance was quite bad: my configured 10-second >>> RPC timeout was regularly encountered. >>> >> >> I believe it is known that crossing sstables is O(NlogN) but I'm unable to >> find the ticket on this at the moment. Perhaps Stu Hood will jump in and >> enlighten me, but in any case I believe >> https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve >> it. >> >> Keeping write volume low enough that compaction can keep up is one >> solution, and throwing hardware at the problem is another, if necessary. >> Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for >> repeat hits. >> >> -Brandon >> > >