On 17/05/2009, at 4:20 AM, Adam Kocoloski wrote:
Ok, so here's a start at reworking some of the memory management and
buffering calculations. It fixes the regression where attachment
memory wasn't being included in the memory utilization numbers, and
it also includes ibrowse memory utilization for attachments (which
is larger than Couch's).
The decision to flush the buffer (to disk or to the remote target
server) is dependent on the number of docs in the buffer, the
approximate number of attachments, and the memory utilization. I
estimate the number of attachments as 0.5*nlinks, since every
attachment download spawns two processes: one dedicated ibrowse
worker and the attachment receiver. The dedicated ibrowse workers
get the attachments out of the connection pool and let us keep a
better eye on their memory usage.
Each of the thresholds is currently just defined as a macro at the
top of the module. I haven't done any work on adjusting these
thresholds dynamically or checkpointing as a function of elapsed time.
The replication module is getting pretty hairy again; in my opinion
its probably time to refactor out the attachment stuff into its own
module. I may get around to that tomorrow if no one objects.
What do you think about adding binary backoff to help with unreliable
links? Even if attachments are buffered to disk there's still the
issue of making checkpoint progress in the face of link failure. Or
maybe checkpoint the buffer on any failure (although that won't help
the situation where couchdb quits).
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
Isn't it enough to see that a garden is beautiful without having to
believe that there are fairies at the bottom of it too?
-- Douglas Adams