[google-appengine] Re: Looking for solution for bigtable problem (which can be solved in relational db easily)

Marcel Overdijk Thu, 16 Jul 2009 11:29:50 -0700

Yes, I was also at the stage of creating a counter for each
combination.

But it will indeed be a challenge.
As users will be able to vote for favourite and disliked books. Users
can also revoke their votes which could mean a lot of counters to
update.
So imagine I have 10 favourite books and 10 disliked books. So there
will at least a counter of 1 for each combination of favourite and
disliked books.
When I remove on the favourite books from my list, I have to update
all the combinations.


It's not only a challenge to implement it, but also to keep it
consistent in case of failure. The more the challenge it will be to
implement the more likely there will be a bug...

In case of a relational database I would compromise at real-time
accurately I would just run a relational query once a hour and store
the results for all combinations in cache.
With a relational query I could join the tables easily. In this way I
would also not care about data inconsistency as it could not happen.

I'm still open for any other ideas and it's interesting to discuss
about bigtable vs more tradiitional relational databases. Maybe there
is a moment in the future I will see the light of bigtable, but not at
the moment to be honestly.

Thanks again,
Marcel


On Jul 16, 7:05 pm, Dan <[email protected]> wrote:
> What you describe sounds like a dynamic grouping, which is challenging
> for any datastore. By challenging, I mean it's difficult to deliver
> quickly and accurately. If you're willing to deal with some hassle and
> compromise a bit on (at least one of) those two objectives, then it
> should be possible to get something working.
>
> Here's a suggestion:
> store a list of books purchased in each user's "row" as a
> listproperty. then when you want to show a list of books that were
> purchased by customers that bought a specific book, you could use the
> = operator to get a list of those listproperties. This approach has
> some challenges, like the 1000-row limit, dealing with duplicate
> references to books. And indices that use list properties can get very
> large, so you'd need to use caution.
>
> another approach is to create a separate table to contain the co-
> purchase relationship. It would contain 1 row for each relationship.
> So if customer A bought books 1,2,3 and customer B bought books 2,3,4
> then the table of relationships would have these rows:
> first_book second_book
> 1 2
> 2 3
> 1 3
> 2 4
> (I might be mistaken, but I think this is similar to how the datastore
> indexes list properties, by specifying all permutations.)
> To make things easier to look up, you could add a field to indicate
> how many times the co-occurrence happened, like this:
> first_book second_book count
> 1 2 1
> 2 3 2 <-- 2 customers have purchased both 2 and 3
> 1 3 1
> 2 4 1
> then when you wanted to know what books have been co-purchased, you
> would run 2 queries on this table, looking for relationships which
> match on the first_book and then on the second_book. You would then
> have a list of books, and would need to check to see which ones got
> the most votes.
>
> In either case, you're going to have to deal with the challenge of a
> large number of intermediate results that need to be ranked/filtered.
> You could try to avoid this problem by adding the vote results to the
> relationship table and putting it into an index, like this:
> first_book second_book count first_book_vote_score
> second_book_vote_score
> ... but then you would need to keep that score information up-to-date,
> which can be a challenge, too. For a book voting situation, you could
> probably relax the accuracy and timeliness goals and do a batch update
> of the vote results in the relationship table, perhaps using a
> "updated_timestamp" field in the users' purchases and the books' votes
> tables.
>
> hth
> Dan
>
> On Jul 16, 1:54 am, Marcel Overdijk <[email protected]> wrote:
>
>
>
> > I have a very simple model. Users and Books.
> > Every user can pick a list of favourite and disliked books.
>
> > Now I like to present an public page with most favourite books.
> > Thi is no problem, I could create a counter (perhaps a sharded
> > counter).
>
> > But when I click a specific book I want to display it details + (and
> > now comes the difficult part) a list of books which Users also voted
> > for who vote for the particular book displayed.
>
> > Similar to Amazon's "People who bought this book, also bought..."
>
> > With a relational db query this would be solved very easily, but I
> > don't see an effective way to this with big table.
>
> > Any ideas?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Looking for solution for bigtable problem (which can be solved in relational db easily)

Reply via email to