This sounds really slow, like somethings wrong. 25 minutes to process
300k means ~500 docs sec, or each document takes 2ms. That's a really
long time CPU wise.
Assuming it's not another VM bug, we should be able about to get that
down to under minute with some tuning, and probably closer to 10 secs
after serious optimizations.
-Damien
On Jul 2, 2008, at 6:28 PM, Chris Anderson wrote:
On Wed, Jul 2, 2008 at 3:08 PM, Paul Davis <[EMAIL PROTECTED]
> wrote:
I'd have to go back and double check, but off the top of my head 25
min for 300K docs seems about like what I was getting. Ie, not orders
of magnitude slower or anything.
In my experience, views generate about 1/2 as fast as that, if not
more slowly. My views are often quite complex with a lot of internal
looping and multiple emits, so that probably explains it. In short,
the times you're reporting seem reasonable.
The bottleneck (based on my extremely unscientific use of top) doesn't
seem to be the view server, but rather CouchDB's beam process, which
as I understand it, is busy sorting the results as they come back from
the view server. So the quickest route to parallelizing this may be to
manually partition your data across CouchDB instances, generate the
views, and query them in parallel, merging the results in your
application.
I don't actually plan to do all that work until my insert rate
eclipses CouchDB's view generation speed. :)
Once upon a time there was a feature to return the available results
of a view, even while generation is still occurring. The feature has
fallen by the wayside, and it would be non-trivial to turn it back on,
according to Damien on IRC. Maybe if it would be useful to enough
people, we'll see it again.
--
Chris Anderson
http://jchris.mfdz.com