Ok, so I've watched Brett's talk on "Building Scalable Web Apps with App Engine", and I've read tons now on entity groups and transactions, but I'm just completely at a loss to come up with an efficient way to model what seems like a very common scenario. :( If anyone is willing to chime in with ideas I'd be really appreciative.
Here's the basic idea: a number of products a number of users users rate products 1-5 products have an "average rating" users can sort products by rating Equivalent, and closer to the existing examples: a number of blog posts a number of users users post (many, many) comments on blog posts blog posts have a "comment count" users can list sort posts by comment count The first problem is that the counter examples I've seen haven't actually needed to count anything that also needed to be in the datastore. As near as I can tell, to accurately count a set of ratings for a particular product I would need to have an entity group per statistics shard that also contained all of the actual user/product/rating entities. If a user also was able to update their rating/comment this means that the assertion that it is easy to later change the number of shards becomes false: if I were to break it up into five groups (let's say) and get a hundred thousand ratings for some product (very likely in my case) then each group is going to have 20,000 entities in it already (with high contention for those updates, even though they don't affect the count), and I have no real way to (safely) move them between shards later. The second problem is that the existing discussions I've seen of this problem ignore sorting by these statistics. Example: not only might you want the total number of comments posted on a blog, you also might want to find blog entries "most discussed" (those with the most comments). The mechanisms of using memcache to store a consistent/efficient count thereby no longer work very well: you need to store the information in an indexed entity. Also, you can't afford for the information to ever be out of date: if you have a few thousand products in your product catalog and you don't have an up-to-date count for any of them, it isn't an option to rebuild the master statistics for all of them before running your query. This means you need to be constantly maintaining the updated global counts as you change things, and if you want them to be safe/accurate that will need to be done in a transaction, which pretty much pulls the entire product/post into a single giant entity group. Put these together and it seems like this very website concept simply /requires/ a really slow implementation :(. Specifically: every product/post gets a single entity group, and the comments/ratings are stored in it, with a single master count stored on the product/post entity (updated via transactions as comments/ratings added); thereby causing massive contention as everyone immediately swoops in to put their comment/rating in. If this is simply "true" then I'll just go ahead and build this and feel "ok" in that I did everything I could, but as long as I have doubts I'm finding it difficult to force myself to lock this scheme into my data ;P. -J --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
