Re: Attachment Replication Problem - Bug Found

Adam Kocoloski Sat, 16 May 2009 09:10:28 -0700

On May 16, 2009, at 11:22 AM, Antony Blakey wrote:

On 16/05/2009, at 11:07 PM, Adam Kocoloski wrote:
No, I don't believe so. ibrowse accepts a {stream_to, pid()}option. It accumulates packets until it reaches a thresholdconfigurable by {stream_chunk_size, integer()} (default 1MB), thensends the data to the Pid. I don't think ibrowse is writing todisk at any point in the process. We do see that when streamingreally large attachments, ibrowse becomes the biggest memory userin the emulator.
This is what I thought was happening, which means that with smalldocuments with many attachments (say > 1Mb) you could potentiallyend up with masses of open connections representing data promisesthat are only forced at checkpoint time, so that's not scalable. Ithink the number of open ibrowse connections (which I see doesn'tneccessariy match the number of unforced promises), needs to be aninput to the checkpoint decision.

So, I think there's still some confusion here. By "open connections"do you mean TCP connections to the source? That number is neverhigher than 10. ibrowse does pipeline requests on those 10connections, so there could be as many as 1000 simultaneous HTTPrequests. However, those requests complete as soon as the datareaches the ibrowse client process, so in fact the number ofoutstanding request during replication is usually very small. We'renot doing flow control at the TCP socket layer.

If by "open connections" you really mean "attachment receiverprocesses spawned by the couch_rep gen_server" I think you'd be closerto the mark. We can get an approximate handle on that just bycounting the number of links to the gen_server.

I'm not sure I understand what part is "not scalable". I agree thatignoring the attachment receivers and their mailboxes when decidingwhether to checkpoint is a big problem. I'm testing a fix for thatright now. Is there something else you meant by that statement? Best,


Adam

P.S. One issue in my mind is that we only do the checkpoint test afterwe receive a document. We could end up in a situation where adocument request is sitting in a pipeline behind a huge attachment,and the checkpoint test won't execute until the entire attachment isdownloaded into memory. There are ways around this, e.g. usingibrowse:spawn_link_worker_process/2 to bypass the default connectionpool for attachment downloads.

Re: Attachment Replication Problem - Bug Found

Reply via email to