Re: [google-appengine] Pattern for histograms of ratings?

Jeff Schwartz Mon, 09 Aug 2010 14:29:19 -0700

I haven't watched the vid in a while & you might be right as it seems the
one you watched is the one I was referring to.


In any case, all the issues that  you raised and which were glanced over in
the presentation are manageable.

You can use tasks to maintain distribution lists but yes, the model
presented was a twitter type model where changes to one's syndication only
affects new content.

Unfortunately, you might find yourself in a box if you try to implement
something different. I've been saying it for a long time that what the
datastore needs more than anything is support for joins at the api level. If
it had that then using entities for indirect indexes would not be needed.

Good luck.

Jeff

On Mon, Aug 9, 2010 at 4:12 PM, Mark Wyszomierski <[email protected]> wrote:

> Thanks Jeff. I think the I/O video I saw was from 2008, given by Bret, not
> sure if it's the same one you're referencing (maybe I mixed them up) but it
> does talk about the fan out problem for microblogging and gives a solution.
>
> The solution in the video is comprehensible but two points stuck out for me
> though:
>
> 1) You need to write all the recipients into each Message object. If the
> author of the message has a large number of friends, writing can take a good
> amount of time. This can be handled in the background using the task queue.
> 2) The Message.recipients list is essentially locked after publication. So
> if I write a message today, and you become my follower tomorrow, my message
> from yesterday won't show up in your feed (you weren't my friend when I
> published the message). That's ok for twitter, but for building a histogram
> it won't work because I need knowledge of all your current friends' ratings
> for the product.
>
> The first item is a little painful but has to be paid for somewhere, item 2
> was a little more problematic. If I treat each product rating instance like
> a tweet, and embed the recipients (the author's current friend network) -
> then when friends are added or dropped, I need to go through and update
> every past product rating instance with the updated friend listing.
>
> I'll see if I can find the 2009 presentation and see if it's different than
> the one I saw, thanks for your help,
>
> Mark
>
>
>
> On Mon, Aug 9, 2010 at 11:59 AM, Jeff Schwartz <[email protected]>wrote:
>
>> You might want to catch the Google IO 09 vid on YouTube where fan-out is
>> discussed. In the vid using listindex entities and key only queries are
>> discussed as a way of defining and selecting large groups. If you can wrap
>> you hands around the concepts and understand how the mentioned
>> implementations works you will have your answer. It is doable but it isn't
>> very pretty. The good part is that it provides very quick queries and
>> eliminates serializing entities that are only used as indexes.
>>
>> Just my $0.02.
>>
>> Jeff
>>
>> On Mon, Aug 9, 2010 at 2:04 PM, Mark <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I have a web app where users can add friends, and can rate products.
>>> The model looks like:
>>>
>>>    class User {
>>>        String username;
>>>    }
>>>
>>>    class Friend {
>>>        String username;
>>>        String usernameFriend;
>>>    }
>>>
>>>    class ProductRating {
>>>        String username;
>>>        String productId;
>>>        int rating; // 1 - 5
>>>    }
>>>
>>> When a user is viewing a product, I want to show them a histogram of
>>> the ratings all their friends gave the product. Since the histogram is
>>> not valuable unless all information is known, this becomes difficult
>>> to do at scale because I need to:
>>>
>>>  1) For the given user, load all their friend names (could be
>>> hundreds or thousands).
>>>  2) For each friend, check if any of them have given a rating for the
>>> product of interest.
>>>  3) Aggregate all friend ratings into a histogram.
>>>
>>> I'll probably timeout fetching deserializing all those objects on
>>> steps 1 & 2. I can precompute histograms for each user for each
>>> product as everyone submits ratings. This would optimize reads later
>>> on but would really increase storage requirements and add additional
>>> cpu use on every rating submission. As friend relationships change, I
>>> would have to also update all precomputed histograms, which would be a
>>> pain.
>>>
>>>
>>>
>>> I'm thinking of doing the following, and wondering how poor an idea it
>>> is. The basic idea is to keep a flat Text object of a user's friends,
>>> and a product's ratings to build histograms in application code,
>>> either on the server or the clients themselves:
>>>
>>>  class User {
>>>      String username;
>>>  }
>>>
>>>  class UserFriends {
>>>      String username;
>>>      Text friends;
>>>  }
>>>
>>>  class ProductRatings {
>>>      String productId;
>>>      Text ratings;
>>>  }
>>>
>>> A user's friends string might look like:
>>>
>>>  UserFriends.friends = "kim,greg,jen,ed,friendN";
>>>
>>> A product's rating string might look like:
>>>
>>>  ProductRatings.ratings = "kim:4,tim:5,ed:2,usernameN:ratingN";
>>>
>>> so in order to build the histogram, I need to:
>>>
>>>  // get my flat friends string.
>>>  select from UserFriends where username='myusername';
>>>
>>>  // get the flat ratings string for the product.
>>>  select from ProductRatings where productId='xyz';
>>>
>>> Once I have both flat strings, I can generate the histogram in
>>> application code. The idea is that I have a better chance of storing
>>> all friends and ratings information in the flat Text objects and
>>> fetching it in a single http connection than if I have if I were to
>>> fetch all the individual objects.
>>>
>>> I was wondering if anyone else has had to do something similar to
>>> this, or if there any approaches I'm overlooking. I spent a lot of
>>> time implementing variations on Bret Slatkin's google i/o 2008 talk
>>> about building scalable applications on app engine, specifically the
>>> microblogging example. In the end, the introduction of a changing
>>> friend network which impacts these histograms made any of my attempts
>>> too costly to run.
>>>
>>> Any thoughts positive or negative would be welcome!
>>>
>>> Thanks
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected]<google-appengine%[email protected]>
>>> .
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>>
>>
>>
>> --
>> --
>> Jeff
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
--
Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Pattern for histograms of ratings?

Reply via email to