On Apr 13, 2010, at 1:48 PM, till wrote: > On Tue, Apr 13, 2010 at 7:28 PM, Adam Kocoloski <[email protected]> wrote: >> On Apr 13, 2010, at 12:39 PM, J Chris Anderson wrote: >> >>> >>> On Apr 13, 2010, at 9:31 AM, till wrote: >>> >>>> Hey devs, >>>> >>>> I'm trying to compact a production database here (in hope to recover >>>> some space), and made the following observations: >>>> >>>> * the set is 212+ million docs >>>> * currently 0.8 TB in size >>>> * the instance (XL) has 2 cores, one is idle, the other maybe utilized at >>>> 10% >>>> * memory - 2 of 15 GB taken, no spikes >>>> * io - well it's EBS :( >>>> >>>> When I started _compact read operations slowed down (I'll give you 20 >>>> Mississippi's for something that loads instantly otherwise). >>>> Everything "eventually" worked, but it slowed down tremendously. >>>> >>>> I restarted the CouchDB process and everything is back to "snap". >>>> >>>> Does anyone have any insight on why that is the case? >>> >>> I'm guessing this is an EBS / EC2 issue. You are probably saturating the IO >>> pipeline. It's too bad there's not an easy way to 'nice' the compaction IO. >>> >>> If you got unlucky and are on a particularly bad EBS / EC2 instance, you >>> might do best to start up a new Couch in the same availability zone and >>> replicate across to it. This will accomplish more-or-less the same effect >>> as compaction. >>> >>>> >>>> Till >>> >> >> I'm surprised it's _that_ bad. The compactor only submits one I/O to EBS at >> a time, so I wouldn't expect other reads to be starved too much. On the >> other hand, I'll bet compacting a DB that large takes at least a month, >> especially if you used random IDs. >> >> On the other hand, when you compact you're messing with the page cache >> something fierce. At 212M docs you need every one of those 16GB of RAM to >> keep the btree nodes cached. The compactor a) reads nodes that your client >> app may not have been touching and b) writes to a new file and the kernel >> starts to cache that too. So it's a fairly brutal process from the >> perspective of the page cache. > > I was looking at my fancy htop when it started to slow down and > neither RAM or CPUs were fully utilized. I mean, not even 50%. That's > what surprises me.
That's not surprising to me. CouchDB doesn't do much active caching, but it relies extensively on the page cache. Presumably "cat /proc/meminfo | grep ^Cached" shows a value near 14 GB. >> Does anyone have a sense of how deep a btree with 212M entries will be? >> That is, how many pread calls are required to pull up a doc? >> >> Till, do you have iostat numbers from the compaction run? > > r...@box:~# iostat > Linux 2.6.21.7-2.fc8xen (couchdb01.east1.aws.easybib.com) 04/13/2010 > _x86_64_ (4 CPU) > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.91 0.00 0.08 8.43 1.38 89.21 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sdb 0.00 0.00 0.00 4578 1008 > sdc 0.00 0.00 0.00 776 0 > sdd 0.00 0.00 0.00 776 0 > sde 0.00 0.00 0.00 776 0 > sda1 0.16 1.09 4.12 3552818 13408432 > sdg 13.63 133.43 106.81 433818674 347266448 > sdh 13.54 94.30 212.11 306595821 689630885 > sdi 13.38 94.23 212.93 306366410 692284040 > sdk 1.91 46.01 73.82 149575695 239999486 > md0 27.04 188.53 425.04 612960367 1381916061 Those are the numbers since system boot. You'd need to specify an interval afterwards to get the report during the compaction. I also like the -x flag. So something like iostat -x 4 will give you one report from system boot, then subsequent reports at 4 second intervals showing the stats from that time period. Best, Adam
