Re: batch=ok for bulk_docs and single doc implementation concerns

Matt Goodall Wed, 14 Apr 2010 06:38:38 -0700

On 14 April 2010 13:23, Adam Kocoloski <[email protected]> wrote:
> On Apr 14, 2010, at 7:59 AM, Matt Goodall wrote:
>
>> Hi,
>>
>> Over in couchdb-python land someone wanted to use batch=ok when
>> creating and updating documents, so we added support.
>>
>> I was semi-surprised to notice that _bulk_docs does not support
>> batch=ok. I realise _bulk_docs essentially is a batch update but a
>> _bulk_docs batch=ok would presumably allow CouchDB to buffer more in
>> memory before writing to disk. What are your thoughts?
>
> Its probably of limited utility.  If you're already batching on the client 
> side, you can achieve the same effect by sending in a larger batch.  I'm not 
> opposed to it per se, just don't think it will help with throughput all that 
> much.


:nod: given the new behaviour I'm inclined to agree.

>
>>
>> Now, this buffering is where the "implementation concerns" come in.
>> According to the wiki:
>>
>> "There is a query option batch=ok which can be used to achieve higher
>> throughput at the cost of lower guarantees. When a PUT (or a document
>> POST as described below) is sent using this option, it is not
>> immediately written to disk. Instead it is stored in memory on a
>> per-user basis for a second or so (or the number of docs in memory
>> reaches a certain point). After the threshold has passed, the docs are
>> committed to disk."
>>
>> However, unless I'm missing something (quite likely ;-)), there is no
>> "stored in memory on a per-user basis" or any check for when "the
>> number of docs in memory reaches a certain point". All it seems to do
>> is spawn a new process so the update happens when the Erlang scheduler
>> gets around to it. In fact, I don't see any reference to the
>> batch_save_interval and batch_save_size configuration options in the
>> code.
>
> The wiki describes the 0.10 implementation of batch=ok.  In 0.11 batch mode 
> takes advantage of the fact that couch_db_updater always merges all waiting 
> updates to a DB into a single write, and so doesn't bother with the separate 
> set of supervised processes accumulating documents.  In effect the 0.11 
> batch=ok is "I'm not going to wait around, but save this as soon as you get a 
> chance".

Ah, I didn't dig far enough into the code to see that happening.

So, purely for my understanding, it's now simplified to a delayed
commit that happens at most 1000ms after normal changes are received.
Anything that causes the commit to happen earlier cancels the pending
commit.

Does that mean that batch="ok" with delayed_commits=false is meaningless?

Anyway, it sounds like the two batch_save config options should be
removed from etc/couchdb/default.ini.tpl.in.

>
> This does change the performance characteristics quite a bit; in particular, 
> when the underlying disk is fast the new batch=ok behavior will result in 
> significantly larger uncompacted databases.

Agh, this suggests I didn't understand the updater's behaviour. Large
uncompacted database normally means lots of small additions to the
database file. How does fast disk speed affect that?

>
>> Shouldn't batch=ok send the doc off to some background process that
>> accumulates docs until either the batch interval or size threshold has
>> been reached? This would also ensure that batch=ok updates are handled
>> in the order they arrive, although I'm not sure if that matters given
>> that the user has basically said they don't care if it succeeds or not
>> by using batch=ok.
>
> I think the documents updates are still handled in the order in which they 
> were received.
>
>>
>> - Matt
>
>
> Best, Adam

Re: batch=ok for bulk_docs and single doc implementation concerns

Reply via email to