What you describe sounds like a dynamic grouping, which is challenging for any datastore. By challenging, I mean it's difficult to deliver quickly and accurately. If you're willing to deal with some hassle and compromise a bit on (at least one of) those two objectives, then it should be possible to get something working.
Here's a suggestion: store a list of books purchased in each user's "row" as a listproperty. then when you want to show a list of books that were purchased by customers that bought a specific book, you could use the = operator to get a list of those listproperties. This approach has some challenges, like the 1000-row limit, dealing with duplicate references to books. And indices that use list properties can get very large, so you'd need to use caution. another approach is to create a separate table to contain the co- purchase relationship. It would contain 1 row for each relationship. So if customer A bought books 1,2,3 and customer B bought books 2,3,4 then the table of relationships would have these rows: first_book second_book 1 2 2 3 1 3 2 4 (I might be mistaken, but I think this is similar to how the datastore indexes list properties, by specifying all permutations.) To make things easier to look up, you could add a field to indicate how many times the co-occurrence happened, like this: first_book second_book count 1 2 1 2 3 2 <-- 2 customers have purchased both 2 and 3 1 3 1 2 4 1 then when you wanted to know what books have been co-purchased, you would run 2 queries on this table, looking for relationships which match on the first_book and then on the second_book. You would then have a list of books, and would need to check to see which ones got the most votes. In either case, you're going to have to deal with the challenge of a large number of intermediate results that need to be ranked/filtered. You could try to avoid this problem by adding the vote results to the relationship table and putting it into an index, like this: first_book second_book count first_book_vote_score second_book_vote_score ... but then you would need to keep that score information up-to-date, which can be a challenge, too. For a book voting situation, you could probably relax the accuracy and timeliness goals and do a batch update of the vote results in the relationship table, perhaps using a "updated_timestamp" field in the users' purchases and the books' votes tables. hth Dan On Jul 16, 1:54 am, Marcel Overdijk <[email protected]> wrote: > I have a very simple model. Users and Books. > Every user can pick a list of favourite and disliked books. > > Now I like to present an public page with most favourite books. > Thi is no problem, I could create a counter (perhaps a sharded > counter). > > But when I click a specific book I want to display it details + (and > now comes the difficult part) a list of books which Users also voted > for who vote for the particular book displayed. > > Similar to Amazon's "People who bought this book, also bought..." > > With a relational db query this would be solved very easily, but I > don't see an effective way to this with big table. > > Any ideas? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
