On 14 April 2010 13:23, Adam Kocoloski <[email protected]> wrote: > On Apr 14, 2010, at 7:59 AM, Matt Goodall wrote: > >> Hi, >> >> Over in couchdb-python land someone wanted to use batch=ok when >> creating and updating documents, so we added support. >> >> I was semi-surprised to notice that _bulk_docs does not support >> batch=ok. I realise _bulk_docs essentially is a batch update but a >> _bulk_docs batch=ok would presumably allow CouchDB to buffer more in >> memory before writing to disk. What are your thoughts? > > Its probably of limited utility. If you're already batching on the client > side, you can achieve the same effect by sending in a larger batch. I'm not > opposed to it per se, just don't think it will help with throughput all that > much.
:nod: given the new behaviour I'm inclined to agree. > >> >> Now, this buffering is where the "implementation concerns" come in. >> According to the wiki: >> >> "There is a query option batch=ok which can be used to achieve higher >> throughput at the cost of lower guarantees. When a PUT (or a document >> POST as described below) is sent using this option, it is not >> immediately written to disk. Instead it is stored in memory on a >> per-user basis for a second or so (or the number of docs in memory >> reaches a certain point). After the threshold has passed, the docs are >> committed to disk." >> >> However, unless I'm missing something (quite likely ;-)), there is no >> "stored in memory on a per-user basis" or any check for when "the >> number of docs in memory reaches a certain point". All it seems to do >> is spawn a new process so the update happens when the Erlang scheduler >> gets around to it. In fact, I don't see any reference to the >> batch_save_interval and batch_save_size configuration options in the >> code. > > The wiki describes the 0.10 implementation of batch=ok. In 0.11 batch mode > takes advantage of the fact that couch_db_updater always merges all waiting > updates to a DB into a single write, and so doesn't bother with the separate > set of supervised processes accumulating documents. In effect the 0.11 > batch=ok is "I'm not going to wait around, but save this as soon as you get a > chance". Ah, I didn't dig far enough into the code to see that happening. So, purely for my understanding, it's now simplified to a delayed commit that happens at most 1000ms after normal changes are received. Anything that causes the commit to happen earlier cancels the pending commit. Does that mean that batch="ok" with delayed_commits=false is meaningless? Anyway, it sounds like the two batch_save config options should be removed from etc/couchdb/default.ini.tpl.in. > > This does change the performance characteristics quite a bit; in particular, > when the underlying disk is fast the new batch=ok behavior will result in > significantly larger uncompacted databases. Agh, this suggests I didn't understand the updater's behaviour. Large uncompacted database normally means lots of small additions to the database file. How does fast disk speed affect that? > >> Shouldn't batch=ok send the doc off to some background process that >> accumulates docs until either the batch interval or size threshold has >> been reached? This would also ensure that batch=ok updates are handled >> in the order they arrive, although I'm not sure if that matters given >> that the user has basically said they don't care if it succeeds or not >> by using batch=ok. > > I think the documents updates are still handled in the order in which they > were received. > >> >> - Matt > > > Best, Adam
