Ryan, I have cc'd dev list, this is of general interest, comments inline.
On Mon, Aug 23, 2010 at 10:41 AM, Ryan Hill <[email protected]> wrote: > Hey there - I like your contribution, but have a couple of questions (that I > didn't want to clutter up the dev-list discussion). > > -- Does your logic take into account map-only results that may have > duplicate keys? In the simple case, this could mean preserving all the > duplicates that satisfy the multi-view constraints (i.e. intersection), if > that behavior seems generally logical. duplicates are kept when testing intersection > > -- I find myself performing quite a few set operations on view results, > currently entirely in memory (which I happen to have enough of) using python > sets. I think your code could be generalized to take care of this, and that > doing so would strengthen the case for inclusion into trunk. Possibly even > more so than improving the iterative sections. Do you see this > generalization being something you would also be interested in? Examples > include union and difference, in addition to the current intersection being > discussed. Composite operations (the difference result of one view taken > w.r.t the intersection of two other views) should also be considered. > I am interested in union and difference, but think we can do this without holding the results in memory, holding the results in memory is fine if you have a few users, but I intentionally wrote it to be streaming so that it scales. I am thinking that getting intersection in the trunk is a good first step, difference operation might be possible by inverting keys, and union is really simple just stream a result from each view one by one. > -- Lastly, have you thought about this in the context of other iterative > map-reduce implementations? Another pattern that I use frequently is to feed > the results of one M/R that accumulates totals for a group of documents into > a second M/R that that applies a statistical test to filter out documents > that fail to meet certain significance criteria. This is fairly easy to > implement in something like Riak or Twister, but would be useful to have in > couch as well. > I haven't thought about feeding results from one map result to another, but agree it is interesting. > Based on my experience with couch, I think there are a finite number of > complex view operations that should find their way into the primary code > path, of which yours is one. I might be able to help implement a level of > abstraction to encapsulate all such operations, if you are similarly > interested. > I have an interesting in geocouch and am thinking of the common query language as a start for defining the view operations, but there might be other query protocols that are better suited, is there a JSONQL for example. > What do you think? > I am interested in any help, so thanks for letting me know. > Cheers, > -R > > >
