On Dec 1, 2008, at 3:02 AM, Antony Blakey wrote:
On 01/12/2008, at 6:09 PM, Ben Bangert wrote:
I want to solve what I thought was a fairly simple problem, though
unfortunately it seems to be rather tricky. I asked on the IRC
channel, and got some good input, but neither really seemed like a
very good solution.
The problem:
I want to allow users to rate things. They do this very very
frequently, so I can't store it in the actual document being rated,
so I have Rating documents. It's easy enough to write a map/reduce
that gives me the computed average rating for a given document,
however, it seems to be impossible to get a listing of the highest
rated documents, as I can only get the computed rating for a
document one at a time.
The possible solutions:
- Buffer rating additions, then at a later time, run through them
and calculate the new average rating, store it in the document as
computed_rating, so I can order on that in the key
- Cron a job that goes in and looks for new rating every 5 mins or
whatever, and then does the same as the previous solution by
storing it in a computed_rating field
I'm not a fan of either of these, because #1 means if my webapp
hiccups, I lose ratings, and #2 is just a pain to keep sweeping the
db for new Rating documents then going through updating all the
documents.
Is there really no other solutions that don't require me to store
the computed rating in the doc itself? There's no way I can perhaps
order on the value from the map/reduce, rather than only being able
to order on the key?
I use an external query handler to solve problems that don't fit map/
reduce.
I've modified Paul Davis's _external handler to pass the current
update_seq whenever an external query is made. In the external
process (ruby in my case) I maintain a SQLite database that I can
use for queries that Couch isn't suited for. Whenever a query comes
in, I compare the supplied update_seq with one stored in my sqlite
db (of course I cache that in memory as long as the process lives).
If the sqlite db is out of date, then I do a _all_docs_by_seq and
update sqlite, including the update_seq record, before doing the
(SQL) query and responding with a JSON document in the same format
as Couch would.
I can delete the sqlite db at any time (well, while the process is
stopped) because it will get recreated/updated when a query comes in
as necessary. This works with replication.
This system has the same lazy-update characteristics as Couch views,
and has the additional advantage that you can do in-memory caching
in the external process which depending on your update frequency,
means you rarely hit the db.
Excellent! What you have created, I think, is a custom view engine. I
hope to see more of this (Lucene FT indexing support is another
example), maybe you could write up how you did it? Can review the code
or design first, if it helps.
-Damien