multi-level views

Justin Balthrop Tue, 02 Jun 2009 19:20:48 -0700

Hi everyone,

I've been reading the dev and user mailing lists for the past month orso, but haven't posted yet. I've fallen in love with couchdb, itspower and simplicity, and I tell everyone who will listen why it is somuch better than a relational db for most applications. I now havemost of the engineering team at our company on board, and I'm in theprocess of converting our rails site from postgres to couchdb.

So, after spending a few weeks converting models over to usingcouchdb, there is one feature that we are desperately missing:


Multi-level map-reduce in views.

We need a way to take the output of reduce and pass it back throughanother map-reduce step (multiple times in some cases). This way, wecould build map-reduce flows that compute (and cache) any complex datacomputation we need.

Our specific use case isn't incredibly important, because multi-levelmap-reduce could be useful in countless ways, but I'll include itanyway just as illustration. The specific need for us arose from thedesire to slice up certain very large documents to make concurrentediting by a huge number of users feasible. Then we started to use aview step to combine the data back into whole documents. This workedreally well at first, but we soon found that we needed to runadditional queries on those documents. So we were stuck with either:

1) do the queries in the client - meaning we lose all the power andcaching of couchdb views; or2) reinsert the combined documents into another database - meaning weare storing the data twice, and we still have to deal with contentionwhen modifying the compound documents in that database.


Multi-level map-reduce would solve this problem perfectly!

Multi-level views could also simplify and improve performance forreduce grouping. The reduce itself would work just like Google's map-reduce by only reducing values that have the exact same map key. Thenif you want to reduce further, you can just use another map-reducestep on top of that with the map emitting a different key so thereduce data will be grouped differently. For example, if you wanted acount of posts per user and total posts, you would implement it as atwo-level map-reduce with the key=user_id for map1 and the key=nullfor map2.

This way, you only calculate reduce values for groupings you careabout, and any particular reduce value is immediately available fromthe cached B+tree values without further computation. There is moreburden on the user to specify ahead of time which groupings they need,but the performance and flexibility would be well worth it. Thiseliminates the need to store reduce values internally in the map B+tree. But it does mean that you would need a B+tree for each reducegrouping to keep incremental reduce updates fast. The improvedperformance comes from the fact that view queries would never need toaggregate reduce values across multiple nodes or do any re-reducing.

Does this make sense? What do you guys think? Have you discussed thepossibility of such a feature?

I'd be happy to discuss it further and even help with theimplementation, though I've only done a little bit of coding inErlang. I'm pretty sure this would mean big changes to the couchdbinternals, so I want to get your opinions and criticisms before I getmy hopes up or dive into any coding.


Cheers,
Justin Balthrop

multi-level views

Reply via email to