Chris Anderson wrote:
On Wed, Jun 3, 2009 at 12:03 PM, Justin Balthrop <jus...@geni.com> wrote:
Nice! That sounds like exactly what I'm looking for. I don't think it will
address the performance issues with reduce, but it's definitely a start.
Do you mind sending a diff of your changes to couch_view_updater.erl? I
diffed your file with trunk and there are a bunch of unrelated changes, of
course.
There's also a Paul Davis's Cascade:
http://github.com/davisp/cascade/tree/master
I'm planning on writing something with Hovercraft that takes a group
reduce query and copies it to another database on demand. It wouldn't
try to be incremental, just provide for easy chaining.
I think chaining by copying to a db is a good way to work, because it
lets you experiment with other views on top of your reduce rows,
without regenerating the whole thing.
Chris
Thanks
On Jun 3, 2009, at 1:42 AM, Viacheslav Seledkin wrote:
Justin Balthrop wrote:
Hi everyone,
I've been reading the dev and user mailing lists for the past month or
so, but haven't posted yet. I've fallen in love with couchdb, its
power and simplicity, and I tell everyone who will listen why it is so
much better than a relational db for most applications. I now have
most of the engineering team at our company on board, and I'm in the
process of converting our rails site from postgres to couchdb.
So, after spending a few weeks converting models over to using
couchdb, there is one feature that we are desperately missing:
Multi-level map-reduce in views.
We need a way to take the output of reduce and pass it back through
another map-reduce step (multiple times in some cases). This way, we
could build map-reduce flows that compute (and cache) any complex data
computation we need.
Our specific use case isn't incredibly important, because multi-level
map-reduce could be useful in countless ways, but I'll include it
anyway just as illustration. The specific need for us arose from the
desire to slice up certain very large documents to make concurrent
editing by a huge number of users feasible. Then we started to use a
view step to combine the data back into whole documents. This worked
really well at first, but we soon found that we needed to run
additional queries on those documents. So we were stuck with either:
1) do the queries in the client - meaning we lose all the power and
caching of couchdb views; or
2) reinsert the combined documents into another database - meaning we
are storing the data twice, and we still have to deal with contention
when modifying the compound documents in that database.
Multi-level map-reduce would solve this problem perfectly!
Multi-level views could also simplify and improve performance for
reduce grouping. The reduce itself would work just like Google's map-
reduce by only reducing values that have the exact same map key. Then
if you want to reduce further, you can just use another map-reduce
step on top of that with the map emitting a different key so the
reduce data will be grouped differently. For example, if you wanted a
count of posts per user and total posts, you would implement it as a
two-level map-reduce with the key=user_id for map1 and the key=null
for map2.
This way, you only calculate reduce values for groupings you care
about, and any particular reduce value is immediately available from
the cached B+tree values without further computation. There is more
burden on the user to specify ahead of time which groupings they need,
but the performance and flexibility would be well worth it. This
eliminates the need to store reduce values internally in the map B
+tree. But it does mean that you would need a B+tree for each reduce
grouping to keep incremental reduce updates fast. The improved
performance comes from the fact that view queries would never need to
aggregate reduce values across multiple nodes or do any re-reducing.
Does this make sense? What do you guys think? Have you discussed the
possibility of such a feature?
I'd be happy to discuss it further and even help with the
implementation, though I've only done a little bit of coding in
Erlang. I'm pretty sure this would mean big changes to the couchdb
internals, so I want to get your opinions and criticisms before I get
my hopes up or dive into any coding.
Cheers,
Justin Balthrop
.
Possible solution, I use it in my production ...
https://issues.apache.org/jira/browse/COUCHDB-249
--
Chris Anderson
http://jchrisa.net
http://couch.io
.
Attached patch to support multilevel views
https://issues.apache.org/jira/browse/COUCHDB-249