Thanks Jeff. I think the I/O video I saw was from 2008, given by Bret, not sure if it's the same one you're referencing (maybe I mixed them up) but it does talk about the fan out problem for microblogging and gives a solution.
The solution in the video is comprehensible but two points stuck out for me though: 1) You need to write all the recipients into each Message object. If the author of the message has a large number of friends, writing can take a good amount of time. This can be handled in the background using the task queue. 2) The Message.recipients list is essentially locked after publication. So if I write a message today, and you become my follower tomorrow, my message from yesterday won't show up in your feed (you weren't my friend when I published the message). That's ok for twitter, but for building a histogram it won't work because I need knowledge of all your current friends' ratings for the product. The first item is a little painful but has to be paid for somewhere, item 2 was a little more problematic. If I treat each product rating instance like a tweet, and embed the recipients (the author's current friend network) - then when friends are added or dropped, I need to go through and update every past product rating instance with the updated friend listing. I'll see if I can find the 2009 presentation and see if it's different than the one I saw, thanks for your help, Mark On Mon, Aug 9, 2010 at 11:59 AM, Jeff Schwartz <[email protected]>wrote: > You might want to catch the Google IO 09 vid on YouTube where fan-out is > discussed. In the vid using listindex entities and key only queries are > discussed as a way of defining and selecting large groups. If you can wrap > you hands around the concepts and understand how the mentioned > implementations works you will have your answer. It is doable but it isn't > very pretty. The good part is that it provides very quick queries and > eliminates serializing entities that are only used as indexes. > > Just my $0.02. > > Jeff > > On Mon, Aug 9, 2010 at 2:04 PM, Mark <[email protected]> wrote: > >> Hi, >> >> I have a web app where users can add friends, and can rate products. >> The model looks like: >> >> class User { >> String username; >> } >> >> class Friend { >> String username; >> String usernameFriend; >> } >> >> class ProductRating { >> String username; >> String productId; >> int rating; // 1 - 5 >> } >> >> When a user is viewing a product, I want to show them a histogram of >> the ratings all their friends gave the product. Since the histogram is >> not valuable unless all information is known, this becomes difficult >> to do at scale because I need to: >> >> 1) For the given user, load all their friend names (could be >> hundreds or thousands). >> 2) For each friend, check if any of them have given a rating for the >> product of interest. >> 3) Aggregate all friend ratings into a histogram. >> >> I'll probably timeout fetching deserializing all those objects on >> steps 1 & 2. I can precompute histograms for each user for each >> product as everyone submits ratings. This would optimize reads later >> on but would really increase storage requirements and add additional >> cpu use on every rating submission. As friend relationships change, I >> would have to also update all precomputed histograms, which would be a >> pain. >> >> >> >> I'm thinking of doing the following, and wondering how poor an idea it >> is. The basic idea is to keep a flat Text object of a user's friends, >> and a product's ratings to build histograms in application code, >> either on the server or the clients themselves: >> >> class User { >> String username; >> } >> >> class UserFriends { >> String username; >> Text friends; >> } >> >> class ProductRatings { >> String productId; >> Text ratings; >> } >> >> A user's friends string might look like: >> >> UserFriends.friends = "kim,greg,jen,ed,friendN"; >> >> A product's rating string might look like: >> >> ProductRatings.ratings = "kim:4,tim:5,ed:2,usernameN:ratingN"; >> >> so in order to build the histogram, I need to: >> >> // get my flat friends string. >> select from UserFriends where username='myusername'; >> >> // get the flat ratings string for the product. >> select from ProductRatings where productId='xyz'; >> >> Once I have both flat strings, I can generate the histogram in >> application code. The idea is that I have a better chance of storing >> all friends and ratings information in the flat Text objects and >> fetching it in a single http connection than if I have if I were to >> fetch all the individual objects. >> >> I was wondering if anyone else has had to do something similar to >> this, or if there any approaches I'm overlooking. I spent a lot of >> time implementing variations on Bret Slatkin's google i/o 2008 talk >> about building scalable applications on app engine, specifically the >> microblogging example. In the end, the introduction of a changing >> friend network which impacts these histograms made any of my attempts >> too costly to run. >> >> Any thoughts positive or negative would be welcome! >> >> Thanks >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<google-appengine%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> >> > > > -- > -- > Jeff > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<google-appengine%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
