Re: Attachment Replication Problem - Bug Found

Adam Kocoloski Sun, 17 May 2009 06:27:54 -0700

On May 16, 2009, at 8:30 PM, Antony Blakey wrote:

On 17/05/2009, at 12:09 AM, Adam Kocoloski wrote:
So, I think there's still some confusion here. By "openconnections" do you mean TCP connections to the source? Thatnumber is never higher than 10. ibrowse does pipeline requests onthose 10 connections, so there could be as many as 1000simultaneous HTTP requests. However, those requests complete assoon as the data reaches the ibrowse client process, so in fact thenumber of outstanding request during replication is usually verysmall. We're not doing flow control at the TCP socket layer.
OK, I understand that now. That means that a document with > 1000attachments can't be replicated because ibrowse will never sendibrowse_async_headers for the excess attachments to attachment_loop,which needs to happen for every attachment before any of the data isread by doc_flush_binaries. Which is to say that every documentattachment needs to start e.g. receive headers, before anyattachment bodies are consumed.

Not quite. So, this discussion is going to quickly become even moreconfusing because as of yesterday attachments are downloaded ondedicated connections outside the load-balanced connection pool. Forthe sake of argument let's stick with the behavior as of 2 days ago atfirst.

I keep coming back to this key point: _ibrowse has no flow control_.It doesn't matter whether we consume the ibrowse_async_headers messagein the attachment receiver or not; ibrowse is still going toimmediately send all those ibrowse_async_response messages our way.

Now, your point about limits on the number of attachments in adocument is a good one. What I imagine would happen is the following:

1) couch_rep spawns off 1000+ attachment requests to ibrowse for asingle document2) ibrowse starts sending back {error, retry_later} responses when thequeue is full3) the attachment receiver processes start exiting withattachment_request_failed4) couch_rep traps the exits and reboots the document enumeratorstarting at current_seq

5) repeat

Obviously this is not a good situation. Now, I mentioned earlier thatas of yesterday the attachment downloads are each done on dedicatedconnections. I pulled them out of the connection pool so that adocument download didn't get stuck behind a giant attachment download(the end result would be one way to make couch run out of memory).This means that the max_connections x max_pipeline doesn't apply toattachments. Of course, using dedicated connections has its ownscalability problems. Setting up and tearing down all of thoseconnections for the "lots of small attachments" case introduces asignificant cost, and eventually we could have so many connections inTIME_WAIT that we run out of ephemeral ports.

A better solution might be to have a separate load-balanced connectionpool just for attachments. We'd have to exercise some care not toretry attachment requests on a connection that already has requests inthe pipeline.

In my case, I have some large attachments and unreliable links, soI'm partial to a solution that allows progress even of partialattachments during link failure. We could get this by not delayingthe attachments, and buffering them to disk, using range requests onthe get for partial downloads. This would solve some problemsbecause it starts with the requirement to always make progress,never redoing work. This seems like it could be done reasonablytransparently just by modifying the attachment download code.

I definitely like the idea of Range support for making progress in theevent of link failure. In theory, it would be possible to build thisinto ibrowse so we could transparently use it for very large documentsas well.

I'm not absolutely opposed to saving attachments to temporary files ondisk, but I'd prefer to exhaust in-memory options first.


Cheers, Adam

Re: Attachment Replication Problem - Bug Found

Reply via email to