Re: _compact on 0.10.0 <> availability

Adam Kocoloski Tue, 13 Apr 2010 11:01:57 -0700

On Apr 13, 2010, at 1:48 PM, till wrote:

> On Tue, Apr 13, 2010 at 7:28 PM, Adam Kocoloski <[email protected]> wrote:
>> On Apr 13, 2010, at 12:39 PM, J Chris Anderson wrote:
>> 
>>> 
>>> On Apr 13, 2010, at 9:31 AM, till wrote:
>>> 
>>>> Hey devs,
>>>> 
>>>> I'm trying to compact a production database here (in hope to recover
>>>> some space), and made the following observations:
>>>> 
>>>> * the set is 212+ million docs
>>>> * currently 0.8 TB in size
>>>> * the instance (XL) has 2 cores, one is idle, the other maybe utilized at 
>>>> 10%
>>>> * memory - 2 of 15 GB taken, no spikes
>>>> * io - well it's EBS :(
>>>> 
>>>> When I started _compact read operations slowed down (I'll give you 20
>>>> Mississippi's for something that loads instantly otherwise).
>>>> Everything "eventually" worked, but it slowed down tremendously.
>>>> 
>>>> I restarted the CouchDB process and everything is back to "snap".
>>>> 
>>>> Does anyone have any insight on why that is the case?
>>> 
>>> I'm guessing this is an EBS / EC2 issue. You are probably saturating the IO 
>>> pipeline. It's too bad there's not an easy way to 'nice' the compaction IO.
>>> 
>>> If you got unlucky and are on a particularly bad EBS / EC2 instance, you 
>>> might do best to start up a new Couch in the same availability zone and 
>>> replicate across to it. This will accomplish more-or-less the same effect 
>>> as compaction.
>>> 
>>>> 
>>>> Till
>>> 
>> 
>> I'm surprised it's _that_ bad.  The compactor only submits one I/O to EBS at 
>> a time, so I wouldn't expect other reads to be starved too much.  On the 
>> other hand, I'll bet compacting a DB that large takes at least a month, 
>> especially if you used random IDs.
>> 
>> On the other hand, when you compact you're messing with the page cache 
>> something fierce.  At 212M docs you need every one of those 16GB of RAM to 
>> keep the btree nodes cached.  The compactor a) reads nodes that your client 
>> app may not have been touching and b) writes to a new file and the kernel 
>> starts to cache that too.  So it's a fairly brutal process from the 
>> perspective of the page cache.
> 
> I was looking at my fancy htop when it started to slow down and
> neither RAM or CPUs were fully utilized. I mean, not even 50%. That's
> what surprises me.


That's not surprising to me.  CouchDB doesn't do much active caching, but it 
relies extensively on the page cache.  Presumably "cat /proc/meminfo | grep 
^Cached" shows a value near 14 GB.

>> Does anyone have a sense of how deep a btree with 212M entries will be?  
>> That is, how many pread calls are required to pull up a doc?
>> 
>> Till, do you have iostat numbers from the compaction run?
> 
> r...@box:~# iostat
> Linux 2.6.21.7-2.fc8xen (couchdb01.east1.aws.easybib.com)     04/13/2010
>       _x86_64_        (4 CPU)
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           0.91    0.00    0.08    8.43    1.38   89.21
> 
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sdb               0.00         0.00         0.00       4578       1008
> sdc               0.00         0.00         0.00        776          0
> sdd               0.00         0.00         0.00        776          0
> sde               0.00         0.00         0.00        776          0
> sda1              0.16         1.09         4.12    3552818   13408432
> sdg              13.63       133.43       106.81  433818674  347266448
> sdh              13.54        94.30       212.11  306595821  689630885
> sdi              13.38        94.23       212.93  306366410  692284040
> sdk               1.91        46.01        73.82  149575695  239999486
> md0              27.04       188.53       425.04  612960367 1381916061

Those are the numbers since system boot.  You'd need to specify an interval 
afterwards to get the report during the compaction.  I also like the -x flag.  
So something like

iostat -x 4

will give you one report from system boot, then subsequent reports at 4 second 
intervals showing the stats from that time period.  Best,

Adam

Re: _compact on 0.10.0 <> availability

Reply via email to