in my experience #2 will work well up to a point where it will trigger a limitation of cassandra (slated to be resolved in .7 \o/) where all of the columns under a given key must be able to fit into memory. For things like index's of data I have opted to shard the keys for really large data sets to get around this until its fixed....
I suspect if you doubled the test for #2 once or twice you'll start seeing OOM's also #2 will end up having a lumpy distribution around a cluster as all the data under a given key needs to be able to fit on one machine, #1 will spread out a bit finer. cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Tue, Mar 9, 2010 at 07:15, Sylvain Lebresne <sylv...@yakaz.com> wrote: > Hello, > > I've done some tests and it seems that somehow to have more rows with few > columns is better than to have more rows with fewer columns, at least as long > as read performance is concerned. > Using stress.py, on a quad core 2.27Ghz with 4Go RAM and the out of the box > cassandra configuration, I inserted: > > 1) 50000000 rows (that's 50 millions) with 1 column each > (stress.py -n 50000000 -c 1) > 2) 500000 rows (that's 500 thousands) with 100 column each > (stress.py -n 500000 -c 100) > > that is, it ends up with 50 millions columns in both case (I use such big > numbers so that in case 2, the resulting data are big enough not to fit in > the system caches, in which case the problem I'm mentioning below > doesn't show). > Those two 'tests' have been done separatly, with data flushed completely > between them. I let cassandra compact everything each time, shutdown the > server and start it again (so that no data is in memtable). Then I tried > reading columns, one at a time using: > 1) stress.py -t 10 -o read -n 50000000 -c 1 -r > 2) stress.py -t 10 -o read -n 500000 -c 1 -r > > In the case 1) I get around 200 reads/seconds and that's pretty stable. The > disk is spinning like crazy (~25% io_wait), very few cpu or memory used, > performances are IO bound, which is expected. > In the case 2) however, it starts with reasonnable performance (400+ > reads/seconds), but it very quickly drop to an average of 80 reads/seconds > (after a minute and a half or so). And it don't go up significantly after > that. Turns out this seems to be a GC problem. Indeed, the info log (I'm > running trunk from today, but I first saw the problem on an older version of > trunk) show every few seconds lines like: > GC for ConcurrentMarkSweep: 4599 ms, 57247304 reclaimed leaving > 1033481216 used; max is 1211498496 > I'm not surprised that performance are bad with such GC pauses. I'm surprised > to have such GC pauses. > > Note that in case 1), the resulting data 'weights' ~14G, while in case 2) it > 'weights' only ~2.4G. > > Let me add that I used stress.py to try to identify the problem, but I first > run into it in an application I'm writting where I had rows with around 1000 > columns of 30K each. With about 1000 rows, I had awfull performances, like 5 > reads/seconds on average. I try switching to 1 millions row having each 1 > column of 30K and end up with more than 300 reads/seconds. > > Any idea, insight ? Am I doing something utterly wrong ? > Thanks in advance. > > -- > Sylvain >