On Jul 3, 2009, at 6:37 PM, Chris Anderson wrote:

2009/7/3 Göran Krampe <[email protected]>:
Hi folks!

We are writing an app using CouchDB where we tried to do some map/ reduce to calculate "period sums" for about 1000 different "accounts". This is fiscal data btw, the system is meant to store detailed fiscal data for about 50000
companies, for starters. :)

The map function is trivial, it just emits a bunch of "accountNo, amount"
pairs with "month" as key.

The reduce/rereduce take these and builds a dictionary (JSON object) with "month-accountNo" as key (like "2009/10-2335" and the sum as the value. This
works fine, yes, it builds up a bit but there is a maximum of account
numbers and months so it doesn't grow out of control, so that is NOT the
issue.

There is *no reason ever* to build up a dictionary with more then a
small handful of items in it. Eg it's ok if your dictionary has this
fixed set of keys: count, total, stddev, avg.

It's not OK to do what you are doing. This is what group_level is for.
Rewrite your map reduce to be correct and then we can start talking
about performance.

I don't mean to be harsh but suggesting you have a performance problem
here is like me complaining that my Ferrari makes a bad boat.

Cheers,
Chris

Wow, that was unusually harsh coming from you, Chris. Taking a closer look at Göran's map and reduce functions I agree that they should be reworked to make use of group=true, but nevertheless I wonder if we do have something to work on here.

I believe Göran's problem was that the second pass was causing the view updater process to use a significant amount of memory and trigger should_flush() immediately. As a result, view KVs were being written to disk after every document (triggering the reduce/rereduce step). This is fantastically inefficient. If the penalty for flushing results to disk during indexing is so severe, perhaps we want to be a little more flexible in imposing it. There could be very legitimate cases where users with large documents and/or sophisticated workflows are hung out to dry during indexing because the view updater wants a measly 11MB of memory to do its job.

Adam

Ok, here comes the punchline. When we dump the first 1000 docs using bulk, which typically will amount to say 5000 emits - and we "touch" the view to
trigger it - it will be rather fast and behaves like this:

- a single Erlang process runs and emits all values, then it does a bunch or reduce on those values and finally it switches into rereduce mode and does those and then you can see the dictionary "growing" a bit but never too
much. It is pretty fast, a second or two all in all.

Fine. Them we dump the *next* 1000 docs into Couch and triggers the view
again. This time it behaves like this (believe it or not):

- two Erlang processes get into play. It seems the same process as above continues with emits (IIRC) but a second one starts doing reduce/ rereduce
*while the first one is emitting*.

This is actually by design.

Ouch. And to make it worse - the second one seems to gradually "take over" until we only see 2-3 emits followed by tons of rereduces (all the way up I guess for each emit).

This is not.

Sooo... evidently Couch decides to do stuff in parallell and starts doing
reduce/rereduce while emitting here. AFAIK this is not the behavior
described.

Not sure if it's described, but it is by design. The reduce function executes when the btree is modified. We can't afford to cache KVs from an index update in memory regardless of size; we have to set some threshold when we flush them to disk.

I think the fundamental question is why the flush operations were occurring so frequently the second time around. Is it because you were building up a largish hash for the reduce value? Probably. Nevertheless, I'd like to have a better handle on that.

Adam

The net effect is that the view update that took 1-2 seconds
suddenly takes 400 seconds or goes to a total crawl and never seems to end.

By looking at the log it obviously processes ONE doc at a time - giving us 2-5 emits typically and then tries to reduce that all the way up to the root before processing the next doc. So the rereduces for the internal nodes will
be run typically in this case 1000x more than needed.

Phew. :) Ok, so we are basically hosed with this behavior in this situation.
I can only presume this has gone unnoticed because:

a) Updates most of us do are small. But we dump thousands of new docs using bulk (a full new fiscal year of data for a given company) so we definitely
notice it.

b) Most reduce/rereduce functions are very, very fast. So it goes unnoticed. Our functions are NOT that fast - but if they were only run as they should (well, presuming they *should* only be run after all the emits for all doc changes in a given view update) it would indeed be fast anyway. We can see
that since the first 1000 docs work fine.

...and thanks to the people on #couchdb for discussing this with me earlier today and looking at the Erlang code to try to figure it out. I think Adam
Kocolski and Robert Newson had some idea about it.

regards, Göran

PS. I am on vacation now for 4 weeks, so I will not be answering much email. I wanted to get this posted though since it is in some sense a rather ...
serious performance bottleneck.



--
Chris Anderson
http://jchrisa.net
http://couch.io

Reply via email to