On Dec 1, 2008, at 3:02 AM, Antony Blakey wrote:


On 01/12/2008, at 6:09 PM, Ben Bangert wrote:

I want to solve what I thought was a fairly simple problem, though unfortunately it seems to be rather tricky. I asked on the IRC channel, and got some good input, but neither really seemed like a very good solution.

The problem:

I want to allow users to rate things. They do this very very frequently, so I can't store it in the actual document being rated, so I have Rating documents. It's easy enough to write a map/reduce that gives me the computed average rating for a given document, however, it seems to be impossible to get a listing of the highest rated documents, as I can only get the computed rating for a document one at a time.

The possible solutions:
- Buffer rating additions, then at a later time, run through them and calculate the new average rating, store it in the document as computed_rating, so I can order on that in the key - Cron a job that goes in and looks for new rating every 5 mins or whatever, and then does the same as the previous solution by storing it in a computed_rating field

I'm not a fan of either of these, because #1 means if my webapp hiccups, I lose ratings, and #2 is just a pain to keep sweeping the db for new Rating documents then going through updating all the documents.

Is there really no other solutions that don't require me to store the computed rating in the doc itself? There's no way I can perhaps order on the value from the map/reduce, rather than only being able to order on the key?

I use an external query handler to solve problems that don't fit map/ reduce.

I've modified Paul Davis's _external handler to pass the current update_seq whenever an external query is made. In the external process (ruby in my case) I maintain a SQLite database that I can use for queries that Couch isn't suited for. Whenever a query comes in, I compare the supplied update_seq with one stored in my sqlite db (of course I cache that in memory as long as the process lives). If the sqlite db is out of date, then I do a _all_docs_by_seq and update sqlite, including the update_seq record, before doing the (SQL) query and responding with a JSON document in the same format as Couch would.


I can delete the sqlite db at any time (well, while the process is stopped) because it will get recreated/updated when a query comes in as necessary. This works with replication.

This system has the same lazy-update characteristics as Couch views, and has the additional advantage that you can do in-memory caching in the external process which depending on your update frequency, means you rarely hit the db.

Excellent! What you have created, I think, is a custom view engine. I hope to see more of this (Lucene FT indexing support is another example), maybe you could write up how you did it? Can review the code or design first, if it helps.

-Damien



Reply via email to