[ 
https://issues.apache.org/jira/browse/COUCHDB-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776646#action_12776646
 ] 

Damien Katz commented on COUCHDB-568:
-------------------------------------

I know this isn't what you are trying to achieve with the batching, but I just 
remembered something.

Any easy way to parallelized actions and use more CPUs is to spawn_link a 
process to query/modify a sub node with a matching batch of key/actions, then 
move on to the next sub node and matching key/actions and do the same, etc, 
etc. Then you wait for your sub-pids to return back their result. The maximal # 
of recursive sub-processes active at any time would K*Log(N), where K is the 
number of keys we are query/modifying and N is the total number of entries in 
the index. In practice it wouldn't be that many processes, but it might want to 
limit the number of processses spawned anyway, depending on tuning.

I think that actually would be a pretty good way to use more of the CPUs while 
updating an index, assuming you are CPU bound on writes. For reads of multiple 
keys, it likely would also be a performance win.

> When delayed_commits = true, keep updated btree nodes in memory until the 
> commit
> --------------------------------------------------------------------------------
>
>                 Key: COUCHDB-568
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-568
>             Project: CouchDB
>          Issue Type: Improvement
>    Affects Versions: 0.10
>            Reporter: Adam Kocoloski
>
> rnewson reported on IRC that the new batch=ok implementation results in 
> significantly larger overhead in the .couch files.  This makes sense; the old 
> batch mode waited 1 second before saving, but the new implementation just 
> updates the doc asynchronously.  With fast hardware and moderate write rates 
> it's likely that each document is being written separately.
> The overhead presumably arises from frequently updated btree inner nodes 
> being written to disk many times over.  I'm interested in exploring a 
> modification of the delayed_commits mode whereby the updated btree nodes are 
> not actually written to disk immediately, but are instead held in memory 
> until the commit.  I'd like to think that this will result in more compact 
> files without any decrease in durability.  New read requests would still be 
> able to access these in-memory nodes.
> I realize the notion that updates go directly to disk is baked pretty deeply 
> into couch_btree, but I still thought this was worth bringing up to a wider 
> audience.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to