[google-appengine] Re: Looking for solution for bigtable problem (which can be solved in relational db easily)

Dan Thu, 16 Jul 2009 10:05:51 -0700

What you describe sounds like a dynamic grouping, which is challenging
for any datastore. By challenging, I mean it's difficult to deliver
quickly and accurately. If you're willing to deal with some hassle and
compromise a bit on (at least one of) those two objectives, then it
should be possible to get something working.

Here's a suggestion:
store a list of books purchased in each user's "row" as a
listproperty. then when you want to show a list of books that were
purchased by customers that bought a specific book, you could use the
= operator to get a list of those listproperties. This approach has
some challenges, like the 1000-row limit, dealing with duplicate
references to books. And indices that use list properties can get very
large, so you'd need to use caution.

another approach is to create a separate table to contain the co-
purchase relationship. It would contain 1 row for each relationship.
So if customer A bought books 1,2,3 and customer B bought books 2,3,4
then the table of relationships would have these rows:
first_book second_book
1 2
2 3
1 3
2 4
(I might be mistaken, but I think this is similar to how the datastore
indexes list properties, by specifying all permutations.)
To make things easier to look up, you could add a field to indicate
how many times the co-occurrence happened, like this:
first_book second_book count
1 2 1
2 3 2 <-- 2 customers have purchased both 2 and 3
1 3 1
2 4 1
then when you wanted to know what books have been co-purchased, you
would run 2 queries on this table, looking for relationships which
match on the first_book and then on the second_book. You would then
have a list of books, and would need to check to see which ones got
the most votes.

In either case, you're going to have to deal with the challenge of a
large number of intermediate results that need to be ranked/filtered.
You could try to avoid this problem by adding the vote results to the
relationship table and putting it into an index, like this:
first_book second_book count first_book_vote_score
second_book_vote_score
... but then you would need to keep that score information up-to-date,
which can be a challenge, too. For a book voting situation, you could
probably relax the accuracy and timeliness goals and do a batch update
of the vote results in the relationship table, perhaps using a
"updated_timestamp" field in the users' purchases and the books' votes
tables.

hth
Dan

On Jul 16, 1:54 am, Marcel Overdijk <[email protected]> wrote:
> I have a very simple model. Users and Books.
> Every user can pick a list of favourite and disliked books.
>
> Now I like to present an public page with most favourite books.
> Thi is no problem, I could create a counter (perhaps a sharded
> counter).
>
> But when I click a specific book I want to display it details + (and
> now comes the difficult part) a list of books which Users also voted
> for who vote for the particular book displayed.
>
> Similar to Amazon's "People who bought this book, also bought..."
>
> With a relational db query this would be solved very easily, but I
> don't see an effective way to this with big table.
>
> Any ideas?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Looking for solution for bigtable problem (which can be solved in relational db easily)

Reply via email to