Re: Possible bug in indexer... (really)

Adam Kocoloski Fri, 03 Jul 2009 17:36:22 -0700

On Jul 3, 2009, at 6:37 PM, Chris Anderson wrote:

2009/7/3 Göran Krampe <[email protected]>:
Hi folks!
We are writing an app using CouchDB where we tried to do some map/reduce tocalculate "period sums" for about 1000 different "accounts". Thisis fiscaldata btw, the system is meant to store detailed fiscal data forabout 50000
companies, for starters. :)
The map function is trivial, it just emits a bunch of "accountNo,amount"
pairs with "month" as key.
The reduce/rereduce take these and builds a dictionary (JSONobject) with"month-accountNo" as key (like "2009/10-2335" and the sum as thevalue. This
works fine, yes, it builds up a bit but there is a maximum of account
numbers and months so it doesn't grow out of control, so that isNOT the
issue.
There is *no reason ever* to build up a dictionary with more then a
small handful of items in it. Eg it's ok if your dictionary has this
fixed set of keys: count, total, stddev, avg.

It's not OK to do what you are doing. This is what group_level is for.
Rewrite your map reduce to be correct and then we can start talking
about performance.

I don't mean to be harsh but suggesting you have a performance problem
here is like me complaining that my Ferrari makes a bad boat.

Cheers,
Chris

Wow, that was unusually harsh coming from you, Chris. Taking a closerlook at Göran's map and reduce functions I agree that they should bereworked to make use of group=true, but nevertheless I wonder if we dohave something to work on here.

I believe Göran's problem was that the second pass was causing theview updater process to use a significant amount of memory and triggershould_flush() immediately. As a result, view KVs were being writtento disk after every document (triggering the reduce/rereduce step).This is fantastically inefficient. If the penalty for flushingresults to disk during indexing is so severe, perhaps we want to be alittle more flexible in imposing it. There could be very legitimatecases where users with large documents and/or sophisticated workflowsare hung out to dry during indexing because the view updater wants ameasly 11MB of memory to do its job.


Adam

Ok, here comes the punchline. When we dump the first 1000 docsusing bulk,which typically will amount to say 5000 emits - and we "touch" theview to
trigger it - it will be rather fast and behaves like this:
- a single Erlang process runs and emits all values, then it does abunch orreduce on those values and finally it switches into rereduce modeand doesthose and then you can see the dictionary "growing" a bit but nevertoo
much. It is pretty fast, a second or two all in all.
Fine. Them we dump the *next* 1000 docs into Couch and triggers theview
again. This time it behaves like this (believe it or not):
- two Erlang processes get into play. It seems the same process asabovecontinues with emits (IIRC) but a second one starts doing reduce/rereduce
*while the first one is emitting*.


This is actually by design.

Ouch. And to make it worse - the second one seems to gradually"take over" until we only see 2-3 emits followed by tons ofrereduces (all the way up I guess for each emit).


This is not.

Sooo... evidently Couch decides to do stuff in parallell and startsdoing
reduce/rereduce while emitting here. AFAIK this is not the behavior
described.

Not sure if it's described, but it is by design. The reduce functionexecutes when the btree is modified. We can't afford to cache KVsfrom an index update in memory regardless of size; we have to set somethreshold when we flush them to disk.

I think the fundamental question is why the flush operations wereoccurring so frequently the second time around. Is it because youwere building up a largish hash for the reduce value? Probably.Nevertheless, I'd like to have a better handle on that.


Adam

The net effect is that the view update that took 1-2 seconds
suddenly takes 400 seconds or goes to a total crawl and never seemsto end.
By looking at the log it obviously processes ONE doc at a time -giving us2-5 emits typically and then tries to reduce that all the way up tothe rootbefore processing the next doc. So the rereduces for the internalnodes will
be run typically in this case 1000x more than needed.
Phew. :) Ok, so we are basically hosed with this behavior in thissituation.
I can only presume this has gone unnoticed because:
a) Updates most of us do are small. But we dump thousands of newdocs usingbulk (a full new fiscal year of data for a given company) so wedefinitely
notice it.
b) Most reduce/rereduce functions are very, very fast. So it goesunnoticed.Our functions are NOT that fast - but if they were only run as theyshould(well, presuming they *should* only be run after all the emits forall docchanges in a given view update) it would indeed be fast anyway. Wecan see
that since the first 1000 docs work fine.
...and thanks to the people on #couchdb for discussing this with meearliertoday and looking at the Erlang code to try to figure it out. Ithink Adam
Kocolski and Robert Newson had some idea about it.

regards, Göran
PS. I am on vacation now for 4 weeks, so I will not be answeringmuch email.I wanted to get this posted though since it is in some sense arather ...
serious performance bottleneck.
--
Chris Anderson
http://jchrisa.net
http://couch.io

Re: Possible bug in indexer... (really)

Reply via email to