On 17/05/2009, at 12:09 AM, Adam Kocoloski wrote:
So, I think there's still some confusion here. By "open connections" do you mean TCP connections to the source? That number is never higher than 10. ibrowse does pipeline requests on those 10 connections, so there could be as many as 1000 simultaneous HTTP requests. However, those requests complete as soon as the data reaches the ibrowse client process, so in fact the number of outstanding request during replication is usually very small. We're not doing flow control at the TCP socket layer.
OK, I understand that now. That means that a document with > 1000 attachments can't be replicated because ibrowse will never send ibrowse_async_headers for the excess attachments to attachment_loop, which needs to happen for every attachment before any of the data is read by doc_flush_binaries. Which is to say that every document attachment needs to start e.g. receive headers, before any attachment bodies are consumed.
With concurrent replications the maximum number of attachments is reduced, and it's possible to get a deadlock where the ibrowse queue is full but no document has all of it's attachment downloads started.
I'm not sure I understand what part is "not scalable". I agree that ignoring the attachment receivers and their mailboxes when deciding whether to checkpoint is a big problem. I'm testing a fix for that right now. Is there something else you meant by that statement? Best,
I didn't know about the ibrowse pool, so that part is scalable i.e. bounded number of connections and requests. If my comments above are correct, then the current architecture isn't scalable in respect to the number of attachments in the single-replicator case, and a more complicated equation in the multiple-replicator case.
P.S. One issue in my mind is that we only do the checkpoint test after we receive a document. We could end up in a situation where a document request is sitting in a pipeline behind a huge attachment, and the checkpoint test won't execute until the entire attachment is downloaded into memory. There are ways around this, e.g. using ibrowse:spawn_link_worker_process/2 to bypass the default connection pool for attachment downloads.
Requiring every attachment to be started but not completed seems to me to be a fundamental issue.
In my case, I have some large attachments and unreliable links, so I'm partial to a solution that allows progress even of partial attachments during link failure. We could get this by not delaying the attachments, and buffering them to disk, using range requests on the get for partial downloads. This would solve some problems because it starts with the requirement to always make progress, never redoing work. This seems like it could be done reasonably transparently just by modifying the attachment download code.
Antony Blakey ------------- CTO, Linkuistics Pty Ltd Ph: 0438 840 787 Nothing is really work unless you would rather be doing something else. -- J. M. Barre
