Re: multi-level views

Stephan Wehner Wed, 03 Jun 2009 22:43:54 -0700

On Tue, Jun 2, 2009 at 7:20 PM, Justin Balthrop <jus...@geni.com> wrote:
> Hi everyone,
>
> I've been reading the dev and user mailing lists for the past month or so,
> but haven't posted yet. I've fallen in love with couchdb, its power and
> simplicity, and I tell everyone who will listen why it is so much better
> than a relational db for most applications. I now have most of the
> engineering team at our company on board, and I'm in the process of
> converting our rails site from postgres to couchdb.
>
> So, after spending a few weeks converting models over to using couchdb,
> there is one feature that we are desperately missing:
>
> Multi-level map-reduce in views.
>
> We need a way to take the output of reduce and pass it back through another
> map-reduce step (multiple times in some cases). This way, we could build
> map-reduce flows that compute (and cache) any complex data computation we
> need.
>
> Our specific use case isn't incredibly important, because multi-level
> map-reduce could be useful in countless ways, but I'll include it anyway
> just as illustration. The specific need for us arose from the desire to
> slice up certain very large documents to make concurrent editing by a huge
> number of users feasible. Then we started to use a view step to combine the
> data back into whole documents. This worked really well at first, but we
> soon found that we needed to run additional queries on those documents. So
> we were stuck with either:


Hey there,

Would you mind explaining what those additional queries are?

Stephan

>
> 1) do the queries in the client - meaning we lose all the power and caching
> of couchdb views; or
> 2) reinsert the combined documents into another database - meaning we are
> storing the data twice, and we still have to deal with contention when
> modifying the compound documents in that database.
>
> Multi-level map-reduce would solve this problem perfectly!
>
> Multi-level views could also simplify and improve performance for reduce
> grouping. The reduce itself would work just like Google's map-reduce by only
> reducing values that have the exact same map key. Then if you want to reduce
> further, you can just use another map-reduce step on top of that with the
> map emitting a different key so the reduce data will be grouped differently.
> For example, if you wanted a count of posts per user and total posts, you
> would implement it as a two-level map-reduce with the key=user_id for map1
> and the key=null for map2.
>
> This way, you only calculate reduce values for groupings you care about, and
> any particular reduce value is immediately available from the cached B+tree
> values without further computation. There is more burden on the user to
> specify ahead of time which groupings they need, but the performance and
> flexibility would be well worth it. This eliminates the need to store reduce
> values internally in the map B+tree. But it does mean that you would need a
> B+tree for each reduce grouping to keep incremental reduce updates fast. The
> improved performance comes from the fact that view queries would never need
> to aggregate reduce values across multiple nodes or do any re-reducing.
>
> Does this make sense? What do you guys think? Have you discussed the
> possibility of such a feature?
>
> I'd be happy to discuss it further and even help with the implementation,
> though I've only done a little bit of coding in Erlang. I'm pretty sure this
> would mean big changes to the couchdb internals, so I want to get your
> opinions and criticisms before I get my hopes up or dive into any coding.
>
> Cheers,
> Justin Balthrop
>



-- 
Stephan Wehner

-> http://stephan.sugarmotor.org (blog and homepage)
-> http://www.thrackle.org
-> http://www.buckmaster.ca
-> http://www.trafficlife.com
-> http://stephansmap.org -- http://blog.stephansmap.org

Re: multi-level views

Reply via email to