On Apr 14, 2010, at 7:59 AM, Matt Goodall wrote:

> Hi,
> 
> Over in couchdb-python land someone wanted to use batch=ok when
> creating and updating documents, so we added support.
> 
> I was semi-surprised to notice that _bulk_docs does not support
> batch=ok. I realise _bulk_docs essentially is a batch update but a
> _bulk_docs batch=ok would presumably allow CouchDB to buffer more in
> memory before writing to disk. What are your thoughts?

Its probably of limited utility.  If you're already batching on the client 
side, you can achieve the same effect by sending in a larger batch.  I'm not 
opposed to it per se, just don't think it will help with throughput all that 
much.

> 
> Now, this buffering is where the "implementation concerns" come in.
> According to the wiki:
> 
> "There is a query option batch=ok which can be used to achieve higher
> throughput at the cost of lower guarantees. When a PUT (or a document
> POST as described below) is sent using this option, it is not
> immediately written to disk. Instead it is stored in memory on a
> per-user basis for a second or so (or the number of docs in memory
> reaches a certain point). After the threshold has passed, the docs are
> committed to disk."
> 
> However, unless I'm missing something (quite likely ;-)), there is no
> "stored in memory on a per-user basis" or any check for when "the
> number of docs in memory reaches a certain point". All it seems to do
> is spawn a new process so the update happens when the Erlang scheduler
> gets around to it. In fact, I don't see any reference to the
> batch_save_interval and batch_save_size configuration options in the
> code.

The wiki describes the 0.10 implementation of batch=ok.  In 0.11 batch mode 
takes advantage of the fact that couch_db_updater always merges all waiting 
updates to a DB into a single write, and so doesn't bother with the separate 
set of supervised processes accumulating documents.  In effect the 0.11 
batch=ok is "I'm not going to wait around, but save this as soon as you get a 
chance".

This does change the performance characteristics quite a bit; in particular, 
when the underlying disk is fast the new batch=ok behavior will result in 
significantly larger uncompacted databases.

> Shouldn't batch=ok send the doc off to some background process that
> accumulates docs until either the batch interval or size threshold has
> been reached? This would also ensure that batch=ok updates are handled
> in the order they arrive, although I'm not sure if that matters given
> that the user has basically said they don't care if it succeeds or not
> by using batch=ok.

I think the documents updates are still handled in the order in which they were 
received.

> 
> - Matt


Best, Adam

Reply via email to