On 17/05/2009, at 4:20 AM, Adam Kocoloski wrote:

Ok, so here's a start at reworking some of the memory management and buffering calculations. It fixes the regression where attachment memory wasn't being included in the memory utilization numbers, and it also includes ibrowse memory utilization for attachments (which is larger than Couch's).

The decision to flush the buffer (to disk or to the remote target server) is dependent on the number of docs in the buffer, the approximate number of attachments, and the memory utilization. I estimate the number of attachments as 0.5*nlinks, since every attachment download spawns two processes: one dedicated ibrowse worker and the attachment receiver. The dedicated ibrowse workers get the attachments out of the connection pool and let us keep a better eye on their memory usage.

Each of the thresholds is currently just defined as a macro at the top of the module. I haven't done any work on adjusting these thresholds dynamically or checkpointing as a function of elapsed time.

The replication module is getting pretty hairy again; in my opinion its probably time to refactor out the attachment stuff into its own module. I may get around to that tomorrow if no one objects.

What do you think about adding binary backoff to help with unreliable links? Even if attachments are buffered to disk there's still the issue of making checkpoint progress in the face of link failure. Or maybe checkpoint the buffer on any failure (although that won't help the situation where couchdb quits).

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Isn't it enough to see that a garden is beautiful without having to believe that there are fairies at the bottom of it too?
  -- Douglas Adams

Reply via email to