On Aug 18, 2009, at 4:33 AM, Brian Candler wrote:

On Sat, Aug 15, 2009 at 10:17:28AM -0700, Chris Anderson wrote:
One middle ground implementation that could work for throughput, would
be to use the batch=ok ets based storage, but instead of immediately
returning 202 Accepted, hold the connection open until the batch is
written, and return 201 Created after the batch is written. This would allow the server to optimize batch size, without the client needing to
worry about things, and we could return 201 Created and maintain our
strong consistency guarantees.

Do you mean default to batch=ok behaviour? (In which case, if you don't want to batch you'd specify something else, e.g. x-couch-full-commit: true?)

This is fine by me. Of course, clients doing sequential writes may see very poor performance (i.e. write - wait response - write - wait response etc). However this approach should work well with HTTP pipelining, as well as with clients which open multiple concurrent HTTP connections. The replicator
would need to do pipelining, if it doesn't already.

Errm, it's going to be tough to pipeline PUTs and POSTs, as that's labeled a SHOULD NOT in RFC2616. Even if we know that it would be safe to pipeline PUTs in CouchDB, HTTP clients are probably not going to let it happen. I certainly agree about the connection pool, though. The replicator does use a connection pool, and it pipelines GET requests, too.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.2.2

As I was attempting to say before: any solution which makes write guarantees
should expose behaviour which is meaningful to the client.

- there's no point doing a full commit on every write unless you delay
the HTTP response until after the commit (otherwise there's still a
window where the client thinks the data has still gone safely to disk,
but actually it could be lost)

Right, and we do delay the response in that case, so I think it is meaningful.

- there's no point having two different forms of non-safe write, because
there's no reasonable way for the client to choose between them.
Currently we have 'batch=ok', and we also have a normal write without
'x-couch-full-commit: true' - both end up with the data sitting in RAM
for a while before going to disk, the difference being whether it's
Erlang RAM or VFS buffer cache RAM.

I like the idea of being able to tune the batch size internally within
the server. This could allow CouchDB to automatically adjust for
performance without changing consistency guarantees, eg: run large
batches when under heavy load, but when accessed by a single user,
just do full_commits all the time.

I agree. I also think it would be good to be able to tune this per DB, or
more simply, per write.

e.g. a PUT request could specify max_wait=2000 (if not specified, use a default value from the ini file). Subsequent requests could specify their own max_wait params, and a full commit would occur when the earliest of these times occurs. max_wait=0 would then replace the x-couch-full- commit:
header, which seems like a bit of a frig to me anyway.

from being resource hogs by specifying a min_wait in the ini file. That is, if you set min_wait=100, then any client which insists on having a full commit by specifying max_wait=0 may find itself delayed up to 0.1s before
its request is honoured.

I interpreted Chris' idea differently. Instead of exposing yet more ways to try to tune the DB, put the tuning logic into the server and let it choose when to commit in an attempt to optimize both latency and throughput.

A simple example might be to group together all outstanding write requests and do one commit for the group. When the write load is low, we commit after every update. When the disk is slow or the write load is high, we could have multiple incoming write requests while a single commit is in progress. Instead of committing each one separately (the current behavior AFAIK) we'd update them all together like a single _bulk_docs request. The latency for the earliest requests would increase, but the throughput would be much higher.

In a perfect world I'd like to see x-couch-full-commit and _bulk_docs fall into disuse. I realize the latter won't happen because not everyone wants to implement an HTTP connection pool. batch=ok has very different semantics and so would still be useful, although I imagine that most uses are batch=ok are done to maximize throughput, not minimize latency. If the throughput of normal operation was "high enough" batch=ok probably wouldn't be that popular.

Best, Adam

Reply via email to