On 01/12/2008, at 6:09 PM, Ben Bangert wrote:
I want to solve what I thought was a fairly simple problem, though
unfortunately it seems to be rather tricky. I asked on the IRC
channel, and got some good input, but neither really seemed like a
very good solution.
The problem:
I want to allow users to rate things. They do this very very
frequently, so I can't store it in the actual document being rated,
so I have Rating documents. It's easy enough to write a map/reduce
that gives me the computed average rating for a given document,
however, it seems to be impossible to get a listing of the highest
rated documents, as I can only get the computed rating for a
document one at a time.
The possible solutions:
- Buffer rating additions, then at a later time, run through them
and calculate the new average rating, store it in the document as
computed_rating, so I can order on that in the key
- Cron a job that goes in and looks for new rating every 5 mins or
whatever, and then does the same as the previous solution by storing
it in a computed_rating field
I'm not a fan of either of these, because #1 means if my webapp
hiccups, I lose ratings, and #2 is just a pain to keep sweeping the
db for new Rating documents then going through updating all the
documents.
Is there really no other solutions that don't require me to store
the computed rating in the doc itself? There's no way I can perhaps
order on the value from the map/reduce, rather than only being able
to order on the key?
I use an external query handler to solve problems that don't fit map/
reduce.
I've modified Paul Davis's _external handler to pass the current
update_seq whenever an external query is made. In the external process
(ruby in my case) I maintain a SQLite database that I can use for
queries that Couch isn't suited for. Whenever a query comes in, I
compare the supplied update_seq with one stored in my sqlite db (of
course I cache that in memory as long as the process lives). If the
sqlite db is out of date, then I do a _all_docs_by_seq and update
sqlite, including the update_seq record, before doing the (SQL) query
and responding with a JSON document in the same format as Couch would.
I can delete the sqlite db at any time (well, while the process is
stopped) because it will get recreated/updated when a query comes in
as necessary. This works with replication.
This system has the same lazy-update characteristics as Couch views,
and has the additional advantage that you can do in-memory caching in
the external process which depending on your update frequency, means
you rarely hit the db.
Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787
What can be done with fewer [assumptions] is done in vain with more
-- William of Ockham (ca. 1285-1349)