paul alright... you've gotta give me the remedial explanation of what you meant here! (sorry, i'm still noob-ish)
so, are you saying that i shouldn't even check for individual doc updates, but instead just recreate the entire database? that sounds like a job for cron, more so than the update notifier, right? i'd put up my current ruby script, but it deals update notifications in a way that's very specific to my data —probably very naïvely, to boot! zach On Mon, Apr 27, 2009 at 11:28 AM, Paul Davis <[email protected]> wrote: > Zachary, > > Awesome. The thing with non-incremental updates is that the basic algorithm > would be to just look for updates to the view and on update, delete the > review DB, create a new one, and then dump the new data into it. I wouldn't > try too hard for the optimizing updates at this point in time. > > Getting a ruby script out to show the basics should probably be the first > step. Beyond that we'll have to take it a step at a time. > > HTH, > Paul Davis > > > Zachary Zolton wrote: >> >> @jchris et al, >> >> if you had any pointer, on how to implement this, i have a strong >> motivation to try my hand at it. >> >> i have a janky ruby script running as an update notifier that looks >> for certain criteria, idiomatic to my data, that puts docs into a >> derived database. but i'm not terribly happy with my current >> implementation... >> >> is there a general-purpose algorithm for dealing with updates? >> >> >> cheers, >> >> zach >> >> >> On Sun, Apr 26, 2009 at 10:20 PM, Chris Anderson <[email protected]> wrote: >> >>> >>> Sent from my iPhone >>> >>> On Apr 26, 2009, at 2:26 PM, Wout Mertens <[email protected]> wrote: >>> >>> >>>> >>>> Hi Adam, >>>> >>>> On Apr 22, 2009, at 4:48 PM, Adam Kocoloski wrote: >>>> >>>> >>>>> >>>>> Hi Wout, thanks for writing this up. >>>>> >>>>> One comment about the map-only views: I think you'll find that Couch >>>>> has >>>>> already done a good bit of the work needed to support them, too. Couch >>>>> maintains a btree for each design doc keyed on docid that stores all >>>>> the >>>>> view keys emitted by the maps over each document. When a document is >>>>> updated and then analyzed, Couch has to consult that btree, purge all >>>>> the >>>>> KVs associated with the old version of the doc from each view, and then >>>>> insert the new KVs. So the tracking information correlating docids and >>>>> view >>>>> keys is already available. >>>>> >>>> >>>> See I did not know that :-) Although I should have guessed. >>>> >>>> However, in the mail before this one I argued that it doesn't make sense >>>> to combine or chain map-only views since you can always write a map >>>> function >>>> that does it in one step. Do you agree? >>>> >>>> You might also know the answer to this: is it possible to make the >>>> Review >>>> DB be a sort of view index on the current database? All it needs are >>>> JSON >>>> keys and values, no other fields. >>>> >>>> >>>>> >>>>> You'd still be left with the problem of generating unique docids for >>>>> the >>>>> documents in the Review DB, but I think that's a problem that needs to >>>>> be >>>>> solved. The restriction to only MR views with no duplicate keys across >>>>> views seems too strong to me. >>>>> >>>> >>>> Well, since the Review DB is a local(*) hidden database that's handled a >>>> bit specially, I think the easiest is to assign _id a sequence number >>>> and >>>> create a default view that indexes the documents by doc.key (for >>>> updating >>>> the value for that key). There will never be contention and we're only >>>> interested in the key index. >>>> >>> >>> We discussed this a little at CouchHack and I argued that the simplest >>> solution is actually good for a few reasons. >>> >>> The simple solution: provide a mechanism to copy the rows of a grouped >>> reduce function to a new database. >>> >>> Good because it is most like Hadoop/Google style map reduce. In that >>> paradigm, the output of a map/reduce job is not incremental, and it is >>> persisted in a way that allows for multiple later reduce stages to be run >>> on >>> it. It's common in Hadoop to chain many m/r stages, and to try a few >>> iterations of each stage while developing code. >>> >>> I like this also because it provides the needed functionality without >>> adding >>> any new primitives to CouchDB. >>> >>> The only downside of this approach is that it is not incremental. I'm not >>> sure that incremental chainability has much promise, as the index >>> management >>> could be a pain, especially if you have branching chains. >>> >>> Another upside is that by reducing to a db, you give the user power to do >>> things like use replication to merge multiple data sets before applying >>> more >>> views. >>> >>> I don't want to discourage anyone from experimenting with code, just want >>> to >>> point out this simple solution which would be Very Easy to implement. >>> >>> >>>> >>>> (*)local: I'm assuming that views are not replicated and need to be >>>> recalculated for each CouchDB node. If they are replicated somehow, I >>>> think >>>> it would still work but we'd have to look at it a little more. >>>> >>>> >>>>> >>>>> With that said, I'd prefer to spend my time extending the view engine >>>>> to >>>>> handle chainable MR workflows in a single shot. Especially in the >>>>> simple >>>>> sort_by_value case it just seems like a cleaner way to go about things. >>>>> >>>> >>>> Yes, that seems to be the gist of all repliers and I agree :-) >>>> >>>> In a nutshell, I'm hoping that: >>>> * A review is a new sort of view that has an "inputs" array in its >>>> definition. >>>> * Only MR views are allowed as inputs, no KV duplication allowed. >>>> * It builds a persistent index of the incoming views when those get >>>> updated. >>>> * That index is then used to build the view index for the review when >>>> the >>>> review gets updated. >>>> * I think I covered the most important algorithms needed to implement >>>> this >>>> in my original proposal. >>>> >>>> Does this sound feasible? If so I'll update my proposal accordingly. >>>> >>>> Wout. >>>> > >
