On 17/05/2009, at 9:27 PM, Adam Kocoloski wrote:

On May 16, 2009, at 8:30 PM, Antony Blakey wrote:

On 17/05/2009, at 12:09 AM, Adam Kocoloski wrote:

So, I think there's still some confusion here. By "open connections" do you mean TCP connections to the source? That number is never higher than 10. ibrowse does pipeline requests on those 10 connections, so there could be as many as 1000 simultaneous HTTP requests. However, those requests complete as soon as the data reaches the ibrowse client process, so in fact the number of outstanding request during replication is usually very small. We're not doing flow control at the TCP socket layer.

OK, I understand that now. That means that a document with > 1000 attachments can't be replicated because ibrowse will never send ibrowse_async_headers for the excess attachments to attachment_loop, which needs to happen for every attachment before any of the data is read by doc_flush_binaries. Which is to say that every document attachment needs to start e.g. receive headers, before any attachment bodies are consumed.

Not quite. So, this discussion is going to quickly become even more confusing because as of yesterday attachments are downloaded on dedicated connections outside the load-balanced connection pool. For the sake of argument let's stick with the behavior as of 2 days ago at first.

I keep coming back to this key point: _ibrowse has no flow control_. It doesn't matter whether we consume the ibrowse_async_headers message in the attachment receiver or not; ibrowse is still going to immediately send all those ibrowse_async_response messages our way.

Sure, my point was that once the queue is full it won't send the ibrowse_async_headers (because it will never start the connection). I didn't realise that it would fail before that (as you explain below). I was assuming it would just block. Hence all my previous comments.

Now, your point about limits on the number of attachments in a document is a good one. What I imagine would happen is the following:

1) couch_rep spawns off 1000+ attachment requests to ibrowse for a single document 2) ibrowse starts sending back {error, retry_later} responses when the queue is full 3) the attachment receiver processes start exiting with attachment_request_failed 4) couch_rep traps the exits and reboots the document enumerator starting at current_seq
5) repeat

Obviously this is not a good situation. Now, I mentioned earlier that as of yesterday the attachment downloads are each done on dedicated connections. I pulled them out of the connection pool so that a document download didn't get stuck behind a giant attachment download (the end result would be one way to make couch run out of memory). This means that the max_connections x max_pipeline doesn't apply to attachments. Of course, using dedicated connections has its own scalability problems. Setting up and tearing down all of those connections for the "lots of small attachments" case introduces a significant cost, and eventually we could have so many connections in TIME_WAIT that we run out of ephemeral ports.

That new scalability problem is what I thought the original problem was with ibrowse before I learnt it had a pool.

A better solution might be to have a separate load-balanced connection pool just for attachments. We'd have to exercise some care not to retry attachment requests on a connection that already has requests in the pipeline.
In my case, I have some large attachments and unreliable links, so I'm partial to a solution that allows progress even of partial attachments during link failure. We could get this by not delaying the attachments, and buffering them to disk, using range requests on the get for partial downloads. This would solve some problems because it starts with the requirement to always make progress, never redoing work. This seems like it could be done reasonably transparently just by modifying the attachment download code.

I definitely like the idea of Range support for making progress in the event of link failure. In theory, it would be possible to build this into ibrowse so we could transparently use it for very large documents as well.

I'm not absolutely opposed to saving attachments to temporary files on disk, but I'd prefer to exhaust in-memory options first.


I'm pretty sure that the only scalable solution that will handle documents with significant numbers of attachments is to avoid having all the attachments be in-progress downloading before the document is written e.g. either buffering to disk or a more radical mod of allowing attachments to be written before the document, which I guess is not going to happen. And once you allow buffering to disk as a last resort, you may as well use it as the default mechanism. Apart from anything else, it's a good basis for partial attachment download restart.

I'm wondering if it's worth exhausting in-memory options if disk buffering is absolutely required for at least one use case?

The problem I see with building it into ibrowse is the requirement to inject the restart/file management/expiration policies into ibrowse.

Cheers,

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away.
  -- Antoine de Saint-Exupery


Reply via email to