[google-appengine] Re: Getting around joins/in queries

Jason C Tue, 28 Jul 2009 19:03:55 -0700

asynctools uses RPC callbacks to kick off the Iterators in parallel
too. This means that there are some limits to the type of result sets,
but for a particular set of queries they are, of course, monstrously
faster.


j

On Jul 28, 3:54 pm, bFlood <[email protected]> wrote:
> asynctools - excellent stuff, thx for releasing it!
>
> how have you found the results when compared to running the queries
> serially? was it significantly faster? when I was messing around with
> the plumbing to make async qgl queries, I noticed they did run in
> parallel but the datastore.Iterator._Get calls were running serially,
> which I guess is as expected. this seemed to be where a large portion
> of the instance cpu was spent so the perf gain was not huge (they were
> key_only queries though so maybe that was a factor)
>
> that being said, its very possible/likely I was botching some
> important part so I will definitely try out your code. thanks much
>
> cheers
> brian
>
> On Jul 28, 3:51 pm, Jason C <[email protected]> wrote:
>
>
>
> > We have some very early code that allows for multiple parallel
> > queries:http://code.google.com/p/asynctools/Thisallows for a
> > different class of reading and avoids the need to compute/store
> > (certain types of) additional information.
>
> > In fact, it was Facebook friends graph stuff that inspired it in the
> > first place. Those escapades are detailed 
> > here:http://squeeville.com/2009/07/24/asynctools/
>
> > j
>
> > On Jul 28, 9:03 am, Devel63 <[email protected]> wrote:
>
> > > I haven't given your situation much thought, but you could write the
> > > list to memcache and have a cron job that reads memcache and does the
> > > writing.  You don't get instant results, but the user doesn't have to
> > > wait.
>
> > > Caution: you aren't going to be able to do queries that return 500 or
> > > 1000 results anyway ... it will probably take too long to return.  As
> > > I said, I haven't taken the time to really understand what you are
> > > doing, but perhaps you will need to rethink the architecture.
>
> > > On Jul 27, 12:10 pm, Taylor <[email protected]> wrote:
>
> > > > Hi,
>
> > > > I have a facebook app that naturally would like to perform set
> > > > operations on a logged in user's friends against the data in the
> > > > application (stored in app engine).  This presents a particular
> > > > challenge, suppose for example that in my application a logged in user
> > > > can subscribe to a particular event, we then have some classes like
> > > > this:
>
> > > > User {
> > > >    Long key;
> > > >    Long facebookId;
> > > >    // etc...
>
> > > > }
>
> > > > Event {
> > > >   Long key;
> > > >   String name;
> > > >   // etc...
>
> > > > }
>
> > > > Subscribed {
> > > >   Long userKey;
> > > >   Long eventKey;
> > > >   Long start;
> > > >   Long end;
> > > >   //etc...
>
> > > > }
>
> > > > Now, if I have a particular user logged in, who has a set of friends,
> > > > I want to know which events that set of friends have subscribed to.
> > > > A normal query would be "SELECT * from SUBSCRIBED where userkey in
> > > > ( <the set of friends> )"
>
> > > > By now, I know that there are no conditional OR operations, joins, or
> > > > in queries allowed in app engine.
>
> > > > My initial attempt to solve this problem has resulted in writing out
> > > > the intersection at subscription time using an additional table,
> > > > something like:
>
> > > > FriendSubscribed extends Subscribed {
> > > >   Long friendKey;
> > > >   // etc...
>
> > > > }
>
> > > > // for every friend of the logged in user, write a record so the query
> > > > can simply pick up
> > > > // all records for that friend
> > > > void persistForFriends(Collection<Long> friends, Subscribed
> > > > subscribed) {
> > > >   for (Long id : friends) {
> > > >     pm.persist(new FriendSubscribed(id, subscribed));
> > > >   }
>
> > > > }
>
> > > > Now I have shifted the query burden to the write side and I can do a
> > > > simple query on the FriendSubscribed table for a particular user which
> > > > is nice and fast: "SELECT * from FRIENDSUBSCRIBED where friendkey =
> > > > <myuserid>".  This returns all the subscriptions that the logged in
> > > > user's friends have made (with some slight gotchas, e.g. if the user
> > > > adds a new friend, then any subscriptions that the new friend already
> > > > has will not be seen - this is tolerable)
>
> > > > So, what's the problem?  On facebook, a user typically has order 100
> > > > friends.  And I suspect many will have 500 and a few will have 1000+.
>
> > > > Even for order 100 friends, at the current write speed (5/s) I can
> > > > expect a write operation to take approx. 20s!!  This is rather long
> > > > for a web operation, and gets worse as you increase the number of
> > > > friends.
>
> > > > I am curious if anyone has any ideas for solutions?  I have a few
> > > > thoughts, but wanted to see what people thought before moving on to
> > > > the next optimization.  Here are my thoughts:
>
> > > > 1) Queue the friend write operation - it's not critical to the core
> > > > write operation that all the "join" data be written out, it can always
> > > > be re-computed at any point in time by analyzing the Subscribed
> > > > records and current friend list for each friend.  Only problem is I am
> > > > using Java and the scheduled tasks API is not yet supported.
> > > > Furthermore, it still seems like a lot of work to go through if we
> > > > imagine thousands or hundreds of thousands of users making
> > > > subscriptions as each write operation they do gets multiplied by a
> > > > factor of 100-1000.
>
> > > > 2) Make an owned relationship from Events to Subscribed. Something
> > > > like:
>
> > > > Event {
> > > >   Long key;
> > > >   String name;
>
> > > >   Set<Subscribed> subscribed;
> > > > // etc...
>
> > > > }
>
> > > > Now reading an event records gives precisely the information I want
> > > > and simply has to be culled down (the intersection of the logged in
> > > > user's friends and the Event subscribed list is the list of friends
> > > > that are subscribed to that event).  The problem I see with this
> > > > implementation is high contention.  For every subscription event I
> > > > have to lock out updates and make sure everyone is writing a coherent
> > > > view of the set (don't we have to update the value into the Event
> > > > record?)  Or I may be wrong and it may be that the ownership
> > > > relationship is maintained automatically by the datastore and this
> > > > isn't as much of a concern as I think it is as the owned relationship
> > > > doesn't really write anything into the owner record, it just does a
> > > > join-like query to populate the set on read...that might be ideal??
>
> > > > 3) ??  Something else?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Getting around joins/in queries

Reply via email to