Fundamentally, the issue is that updating a view is processing an incoming, ordered list of changes, there's not much parallelism to be had there. The reason sharding works is that you then have two smaller lists which can be processed in parallel (but each still processed serially). And, yes, my thought was that BigCouch will allow multiple shards of the database on a single node, allowing parallel view builds within that node.
Some care must be taken if we pursue optimizing a single view build. In the context of a production system, with more than one database and more than one view, all the cores get plenty of exercise. If every task could run on all cores then I think you might hit other issues. B. On 11 May 2012 13:35, Dirkjan Ochtman <dirk...@ochtman.nl> wrote: > On Fri, May 11, 2012 at 2:29 PM, Robert Newson <rnew...@apache.org> wrote: >> Making a single view indexing process faster keeps coming up. For one >> thing, it's not that easy, otherwise it would have been done by now. >> For another, this problem vanishes when you shard (and the BigCouch >> merge will bring this to CouchDB). > > What does sharding mean in this context? Running CouchDB/BigCouch on > multiple servers, or just running multiple processes on a single box? > If the latter, why can't we run multiple threads/Erlang processes > within a single shard/OS process? If the former, that's kind of silly, > in the sense that building indexes (at least for me) is CPU-bound but > leaves many of the cores in my server idle. > > Cheers, > > Dirkjan