On Tue, Mar 9, 2010 at 7:15 AM, Sylvain Lebresne <sylv...@yakaz.com> wrote: > 1) stress.py -t 10 -o read -n 50000000 -c 1 -r > 2) stress.py -t 10 -o read -n 500000 -c 1 -r > > In the case 1) I get around 200 reads/seconds and that's pretty stable. The > disk is spinning like crazy (~25% io_wait), very few cpu or memory used, > performances are IO bound, which is expected. > In the case 2) however, it starts with reasonnable performance (400+ > reads/seconds), but it very quickly drop to an average of 80 reads/seconds
By "reads" do you mean what stress.py counts (rows) or rows * columns? If it is rows, then you are still actually reading more columns/s in case 2. > And it don't go up significantly after > that. Turns out this seems to be a GC problem. Indeed, the info log (I'm > running trunk from today, but I first saw the problem on an older version of > trunk) show every few seconds lines like: > GC for ConcurrentMarkSweep: 4599 ms, 57247304 reclaimed leaving > 1033481216 used; max is 1211498496 First, use the 0.6 branch, not trunk. we're breaking stuff over there. What happens if you give the jvm 50% more ram? Are you using a 64-bit JVM? -Jonathan