[google-appengine] Pattern for histograms of ratings?

Mark Mon, 09 Aug 2010 11:04:49 -0700

Hi,

I have a web app where users can add friends, and can rate products.
The model looks like:


    class User {
        String username;
    }

    class Friend {
        String username;
        String usernameFriend;
    }

    class ProductRating {
        String username;
        String productId;
        int rating; // 1 - 5
    }

When a user is viewing a product, I want to show them a histogram of
the ratings all their friends gave the product. Since the histogram is
not valuable unless all information is known, this becomes difficult
to do at scale because I need to:

  1) For the given user, load all their friend names (could be
hundreds or thousands).
  2) For each friend, check if any of them have given a rating for the
product of interest.
  3) Aggregate all friend ratings into a histogram.

I'll probably timeout fetching deserializing all those objects on
steps 1 & 2. I can precompute histograms for each user for each
product as everyone submits ratings. This would optimize reads later
on but would really increase storage requirements and add additional
cpu use on every rating submission. As friend relationships change, I
would have to also update all precomputed histograms, which would be a
pain.



I'm thinking of doing the following, and wondering how poor an idea it
is. The basic idea is to keep a flat Text object of a user's friends,
and a product's ratings to build histograms in application code,
either on the server or the clients themselves:

  class User {
      String username;
  }

  class UserFriends {
      String username;
      Text friends;
  }

  class ProductRatings {
      String productId;
      Text ratings;
  }

A user's friends string might look like:

  UserFriends.friends = "kim,greg,jen,ed,friendN";

A product's rating string might look like:

  ProductRatings.ratings = "kim:4,tim:5,ed:2,usernameN:ratingN";

so in order to build the histogram, I need to:

  // get my flat friends string.
  select from UserFriends where username='myusername';

  // get the flat ratings string for the product.
  select from ProductRatings where productId='xyz';

Once I have both flat strings, I can generate the histogram in
application code. The idea is that I have a better chance of storing
all friends and ratings information in the flat Text objects and
fetching it in a single http connection than if I have if I were to
fetch all the individual objects.

I was wondering if anyone else has had to do something similar to
this, or if there any approaches I'm overlooking. I spent a lot of
time implementing variations on Bret Slatkin's google i/o 2008 talk
about building scalable applications on app engine, specifically the
microblogging example. In the end, the introduction of a changing
friend network which impacts these histograms made any of my attempts
too costly to run.

Any thoughts positive or negative would be welcome!

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Pattern for histograms of ratings?

Reply via email to