I've been thinking a bit about problems that Luke and I have outlined
together related to using CouchDB to it's full potential in melkjug.
I'm going to lay out a few of those problems and a quick summary of what my
thoughts are. These ideas are all still a little half-baked so feedback
would be appreciated.

*How do we get a random subset of a query?*
When writing a view for CouchDB each document is run through a map function
to generate the view.  The map function is supposed to emit a key/value
combination (though these can each be complex types) for each document.  If
we want a random subset of documents we can make they key (or some element
of a complex key) be a random integer. Then we can just take the first n
results.

*How do we use CouchDB to distribute our filtering onto a cluster?
*Unless I'm mistaken (and only a subset of javascript is available to a map
function, in which case maybe we can use a Python view server), we should be
able to calculate scores for each document through a view.  The steps needed
to make this happen I see as follows:

The filter needs to be accessible from the map function.
Either we:
 - build our own View Server in Python and include our filter modules to
call directly for calculating scores.
 - implement a RESTful pattern for calling filter modules via HTTP/JSON
(happy side effect is the possibility for off site filters)
 - maybe do both

*Is there anything we can do to improve our measurements of "goodness" and
our results?
*There has been discussion about the recommendation/rating algorithm for
Melkjug:
http://tinyurl.com/5s8tdh
http://tinyurl.com/5gawhk

If we wanted to get really nutty (read: awesome), it seems feasible to
implement a closest-n articles (by Euclidean distance or dot product) view
in a way which distributes. A Python view server could have views which
leverage Numpy for doing the linear algebra for us. We could generate a view
for each user which is updated whenever the users preferences change.  This
view would, for each score document (document which stores the scores for a
given article against all filters), calculate the distance we require
against the user's preference vector.  Since we cannot pass arguments to
CouchDB views, we can simply update the view when a user changes their
filtering preferences.

-Randall

Reply via email to