On Aug 18, 2009, at 4:33 AM, Brian Candler wrote:
On Sat, Aug 15, 2009 at 10:17:28AM -0700, Chris Anderson wrote:
One middle ground implementation that could work for throughput,
would
be to use the batch=ok ets based storage, but instead of immediately
returning 202 Accepted, hold the connection open until the batch is
written, and return 201 Created after the batch is written. This
would
allow the server to optimize batch size, without the client needing
to
worry about things, and we could return 201 Created and maintain our
strong consistency guarantees.
Do you mean default to batch=ok behaviour? (In which case, if you
don't want
to batch you'd specify something else, e.g. x-couch-full-commit:
true?)
This is fine by me. Of course, clients doing sequential writes may
see very
poor performance (i.e. write - wait response - write - wait response
etc).
However this approach should work well with HTTP pipelining, as well
as with
clients which open multiple concurrent HTTP connections. The
replicator
would need to do pipelining, if it doesn't already.
Errm, it's going to be tough to pipeline PUTs and POSTs, as that's
labeled a SHOULD NOT in RFC2616. Even if we know that it would be
safe to pipeline PUTs in CouchDB, HTTP clients are probably not going
to let it happen. I certainly agree about the connection pool,
though. The replicator does use a connection pool, and it pipelines
GET requests, too.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.1.2.2
As I was attempting to say before: any solution which makes write
guarantees
should expose behaviour which is meaningful to the client.
- there's no point doing a full commit on every write unless you delay
the HTTP response until after the commit (otherwise there's still a
window where the client thinks the data has still gone safely to disk,
but actually it could be lost)
Right, and we do delay the response in that case, so I think it is
meaningful.
- there's no point having two different forms of non-safe write,
because
there's no reasonable way for the client to choose between them.
Currently we have 'batch=ok', and we also have a normal write without
'x-couch-full-commit: true' - both end up with the data sitting in RAM
for a while before going to disk, the difference being whether it's
Erlang RAM or VFS buffer cache RAM.
I like the idea of being able to tune the batch size internally
within
the server. This could allow CouchDB to automatically adjust for
performance without changing consistency guarantees, eg: run large
batches when under heavy load, but when accessed by a single user,
just do full_commits all the time.
I agree. I also think it would be good to be able to tune this per
DB, or
more simply, per write.
e.g. a PUT request could specify max_wait=2000 (if not specified,
use a
default value from the ini file). Subsequent requests could specify
their
own max_wait params, and a full commit would occur when the earliest
of
these times occurs. max_wait=0 would then replace the x-couch-full-
commit:
header, which seems like a bit of a frig to me anyway.
from being resource hogs by specifying a min_wait in the ini file.
That is,
if you set min_wait=100, then any client which insists on having a
full
commit by specifying max_wait=0 may find itself delayed up to 0.1s
before
its request is honoured.
I interpreted Chris' idea differently. Instead of exposing yet more
ways to try to tune the DB, put the tuning logic into the server and
let it choose when to commit in an attempt to optimize both latency
and throughput.
A simple example might be to group together all outstanding write
requests and do one commit for the group. When the write load is low,
we commit after every update. When the disk is slow or the write load
is high, we could have multiple incoming write requests while a single
commit is in progress. Instead of committing each one separately (the
current behavior AFAIK) we'd update them all together like a single
_bulk_docs request. The latency for the earliest requests would
increase, but the throughput would be much higher.
In a perfect world I'd like to see x-couch-full-commit and _bulk_docs
fall into disuse. I realize the latter won't happen because not
everyone wants to implement an HTTP connection pool. batch=ok has
very different semantics and so would still be useful, although I
imagine that most uses are batch=ok are done to maximize throughput,
not minimize latency. If the throughput of normal operation was "high
enough" batch=ok probably wouldn't be that popular.
Best, Adam