On 17/05/2009, at 9:27 PM, Adam Kocoloski wrote:
On May 16, 2009, at 8:30 PM, Antony Blakey wrote:
On 17/05/2009, at 12:09 AM, Adam Kocoloski wrote:
So, I think there's still some confusion here. By "open
connections" do you mean TCP connections to the source? That
number is never higher than 10. ibrowse does pipeline requests on
those 10 connections, so there could be as many as 1000
simultaneous HTTP requests. However, those requests complete as
soon as the data reaches the ibrowse client process, so in fact
the number of outstanding request during replication is usually
very small. We're not doing flow control at the TCP socket layer.
OK, I understand that now. That means that a document with > 1000
attachments can't be replicated because ibrowse will never send
ibrowse_async_headers for the excess attachments to
attachment_loop, which needs to happen for every attachment before
any of the data is read by doc_flush_binaries. Which is to say that
every document attachment needs to start e.g. receive headers,
before any attachment bodies are consumed.
Not quite. So, this discussion is going to quickly become even more
confusing because as of yesterday attachments are downloaded on
dedicated connections outside the load-balanced connection pool.
For the sake of argument let's stick with the behavior as of 2 days
ago at first.
I keep coming back to this key point: _ibrowse has no flow
control_. It doesn't matter whether we consume the
ibrowse_async_headers message in the attachment receiver or not;
ibrowse is still going to immediately send all those
ibrowse_async_response messages our way.
Sure, my point was that once the queue is full it won't send the
ibrowse_async_headers (because it will never start the connection). I
didn't realise that it would fail before that (as you explain below).
I was assuming it would just block. Hence all my previous comments.
Now, your point about limits on the number of attachments in a
document is a good one. What I imagine would happen is the following:
1) couch_rep spawns off 1000+ attachment requests to ibrowse for a
single document
2) ibrowse starts sending back {error, retry_later} responses when
the queue is full
3) the attachment receiver processes start exiting with
attachment_request_failed
4) couch_rep traps the exits and reboots the document enumerator
starting at current_seq
5) repeat
Obviously this is not a good situation. Now, I mentioned earlier
that as of yesterday the attachment downloads are each done on
dedicated connections. I pulled them out of the connection pool so
that a document download didn't get stuck behind a giant attachment
download (the end result would be one way to make couch run out of
memory). This means that the max_connections x max_pipeline doesn't
apply to attachments. Of course, using dedicated connections has
its own scalability problems. Setting up and tearing down all of
those connections for the "lots of small attachments" case
introduces a significant cost, and eventually we could have so many
connections in TIME_WAIT that we run out of ephemeral ports.
That new scalability problem is what I thought the original problem
was with ibrowse before I learnt it had a pool.
A better solution might be to have a separate load-balanced
connection pool just for attachments. We'd have to exercise some
care not to retry attachment requests on a connection that already
has requests in the pipeline.
In my case, I have some large attachments and unreliable links, so
I'm partial to a solution that allows progress even of partial
attachments during link failure. We could get this by not delaying
the attachments, and buffering them to disk, using range requests
on the get for partial downloads. This would solve some problems
because it starts with the requirement to always make progress,
never redoing work. This seems like it could be done reasonably
transparently just by modifying the attachment download code.
I definitely like the idea of Range support for making progress in
the event of link failure. In theory, it would be possible to build
this into ibrowse so we could transparently use it for very large
documents as well.
I'm not absolutely opposed to saving attachments to temporary files
on disk, but I'd prefer to exhaust in-memory options first.
I'm pretty sure that the only scalable solution that will handle
documents with significant numbers of attachments is to avoid having
all the attachments be in-progress downloading before the document is
written e.g. either buffering to disk or a more radical mod of
allowing attachments to be written before the document, which I guess
is not going to happen. And once you allow buffering to disk as a last
resort, you may as well use it as the default mechanism. Apart from
anything else, it's a good basis for partial attachment download
restart.
I'm wondering if it's worth exhausting in-memory options if disk
buffering is absolutely required for at least one use case?
The problem I see with building it into ibrowse is the requirement to
inject the restart/file management/expiration policies into ibrowse.
Cheers,
Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
In anything at all, perfection is finally attained not when there is
no longer anything to add, but when there is no longer anything to
take away.
-- Antoine de Saint-Exupery