2009/7/4 Göran Krampe <[email protected]>: > Adam Kocoloski wrote: >> >> Not sure if it's described, but it is by design. The reduce function >> executes when the btree is modified. We can't afford to cache KVs from an >> index update in memory regardless of size; we have to set some threshold >> when we flush them to disk. > > And I presume you can't write KVs *without* doing the reduce? > > When I wrote "described" I am referring to the blog post by Ricky Ho btw. It > seems to imply a strict ordering, map -> reduce -> rereduce. IIRC. >
That was probably just the theoretical aspect. Map's always happen first obviously, and then when the key/values are inserted into the btree during a flush the entire tree is built which means that > 0 reduces are called and then re-reduces are run to fill out the tree. At the moment we aren't delaying re-reduce calls because it'd require a major overhaul to the btree code. >> I think the fundamental question is why the flush operations were >> occurring so frequently the second time around. Is it because you were >> building up a largish hash for the reduce value? Probably. Nevertheless, >> I'd like to have a better handle on that. > > Yeah, well, I am on vacation now - but some other guys are not. We could of > course start by trying to rewrite this the Right Way first as Chris said. > > I am curious if it can be done using grouping because we dismissed grouping > due to its relatively slow performance (it runs lots of reduces at query > time IIRC) :) > > Btw, the solution used now DOES return the map for a full year in about 230 > ms, including parsing on client side. So query time was perfectly fine, but > view generation was not. This shows to me that it *can* work. > > regards, Göran > >
