I've been contemplating implementing a new feature that I've been wanting for awhile. There's been some talk of implementing view intersections for a bit now so I figured I'd try and give a summary of what the feature would entail in terms of functionality and then the necessary bits required for an implementation.
So the original idea for view intersections was exactly what the name entails: Show me the intersection between two views for a given set of view query parameters. After thinking about different methods of implementation I think we can extend this to be more powerful and generally applicable. Major Hurdle 1 ============ The first necessary bit of ground work would be to implement an optional value index on views. The more I thought about intersecting views the more I realized it was starting to look pointless. Ignoring something along the lines of group_level=N in that we can join on array prefixes, all views being joined would require exactly the same key. Which begs the question, why not just create 1 view that emits the output of the two you want intersected. I couldn't get past this for a long time until I heard Chris Anderson pondering adding a btree index to the values in a view. The obvious downfalls of the extra space and computation usage are there, but making it optional should solve any qualms in that respect. Given an index on a value we're now able to chain together arbitrary views using either the key or value as well as limit the intersection by any combination of key and value. As a side benefit, we would also get the select views by value restriction as well. I'm thinking it'd be as transparent as adding a [start|end]value and [start|end]value_docid set of URL parameters. I haven't followed this train of thought too far into the code yet, but something approximating that should be fairly doable. A thought occurs that limiting view results by both key and value could be interesting in terms of implementation. Not sure if I'd force it through the intersection API or not. Caveats that come to mind are that this would break binary compatibility for all generated views. It wouldn't require a dump/reload, but it might come as a surprise to people upgrading that all their views are regenerating. Major Hurdle 2 ============ Implementing the view intersection API. First off, it probably needs a new name. Once we have intersections working, unions, subtractions, and the NxM one who's name escapes me (cross product floats up but sounds not right) should be trivially implementable. The underlying implementation for this is basically a large merge sort running over the view btree's. If you read about the merge step in map/reduce/merge that's basically what I've got in my head. The biggest issue that I've found in getting this implemented (excluding a value index) is that I'd need to write a new btree traversal method that used iterators instead of a fold mechanism. This shouldn't be overly difficult to implement. Beyond that then it's basically up to the HTTP interface in parameter parsing and error checking. For passing parameters I'm thinking along the line of a JSON body posted (Premptive: any RESTafarians should reference the long discussion on multi-get before writing about how this isn't RESTful). Also, not sure if it's obvious but I'd plan on allowing arbitrarily nested conditions, ie, "intersection(union(a, b), c)" type of operations. There's a subtle detail in the sort order and thus corresponding btree traversal that might come into play there. I can punt and make the entire request use one sort order, as in the previous example can't specify different sort directions for the two nested operations because you'd get a (presumably) zero overlap in the end. I'm pretty sure if we force all btrees to be traversed in the same direction for each request we don't lose any functionality though. Comments ========= That's the general outline I've got in my head right now. I'm pretty sure I can see 95% of the implementation, but it's possible I'm missing a finer detail somewhere. If you've got questions or comments let's hear them. If there's no general objection then I can probably get to starting an implementation at the end of this week. Thanks, Paul Davis
