Hi Patrick, Good questions!
On Tue, Apr 20, 2010 at 12:57 AM, Patrick Twohig <[email protected]>wrote: > Hi All, > > As I understand it, the process of performing a single fetch (call to > get()) from the dastastore using a key basically involves finding the host > housing the entity, opening a socket, fetching the data, and then cleaning > up the connection. So to fetch something like 30 entities from the > datastore, you're repeating the process 30 times over in serial, each time > incurring whatever overhead is involved. I also read that if you perform > bulk fetches, (ie passing multiple keys at once) you can eliminate a great > deal of that overhead. In one of the videos I watched from Google I/0 2009, > the presenter (whose name I forget - d'oh) said that performing a bulk fetch > actually performs the fetches in parallel from the data store and you shoudl > see requests noticeably faster. > > Currently I have a few situations where the app performs many fetches from > the data store in serially, rather than in bulk, and I believe it is the > result of these requests being extremely slow and CPU intensive. Where > possible, I put into place as much bulk fetches as I can but I'm a little > stuck in a few places. > > I'm basing the fetch latency on today's numbers -- > http://code.google.com/status/appengine/detail/datastore/2010/04/19. > Anomalies aside, It looks like the get latency somewhere between 80ms and > 160ms, let's spit difference and just say that it's 120 milliseconds. > Additionally, the query latency is somewhere between 250ms and 500ms. > Splitting the difference, that's 375ms. I'm just going to use those numbers > as a ballpark estimate for fetching multiple entities from the data store, > feel free to correct me if any of my reasoning is flawed or incorrect. > The figures shown by the status site seem to be on the high side at the moment - they represent worst cases. In my own apps, gets are observed to be more on the order of 10-20ms, while queries vary widely depending on returned data, but average about 100-300ms. > Example 1: http://imagepaste.nullnetwork.net/viewimage.php?id=830 > > Given the above example, I'm assuming that if I performed an ancestor query > with Foo("A") as the ancestor it would effectively bulk-fetch the entire > entity group. I could then use the result of that query to get the data I > need. That would make the fetch from the datastore one query, 375 > milliseconds versus (7entities * 160ms) or 1120ms. So long as you need 3 > or more entities (3 * 160) it would stand to reason that you're just better > off just fetching the whole thing. In some simple tests I did, that seemed > to be the case, the query approach was faster, and that's great if you know > everything is in the same entity group. > > Example 2: http://imagepaste.nullnetwork.net/viewimage.php?id=831 > > Given the above example, none of the entities are in the same entity group, > but I would want to try to perform bulk fetches wherever possible. I would > first fetch Foo("A"). I would then see that it has two key properties > pointing to Bar("B") and Bar("C"), perform a fetch of those two entities at > once. Finally, I would see that Bar("B") and Bar("C") each reference two > more entities -- Baz("D"), Baz("E"), Baz("F"), and Baz("G") for a total of > four. In the worst case, I would fetch each entity individually taking, > once again, 1120ms. In the best case and I perform 3 fetches, (fetch A > first, then fetch B and C, then lastly fetch D, E, F, and G), it would be > more in the neighborhood of 480 milliseconds. It's still an improvement > over fetching each entity individually, but not much. > Very similar to this is the 'referenceproperty prefetching' pattern - see http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine <http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine> > > So I was thinking of ways to improve this, the second example in > particular, because I have a few places in my app where that exact thing is > happening. Right now it's actually implemented with individual fetches, but > it backed by memcache in many circumstances so that definitely helps. > > So given that, here's my questions... > > - When serializing the objects, would it be worthwhile adding some sort > of metadata in the entity that would tell me what other entities it > references (either directly or indirectly) so that I could fetch the whole > thing with one or two API calls? I was thinking that an entity could have > child entities with all the keys it references directly or indirectly. > This > would be a huge pain to implement, and I'm not sure it would make a > noticeable performance boost. > > Certainly, if you experience serial gets as a significant problem that isn't solved with simple prefetching, this could be worth doing. I would avoid using child entities, however, and simply have a list of keys instead. > - Is there something "under the covers" of the API that actually makes > more efficient usage of resources that I don't know about? > > There's a lot 'under the covers' - what specifically are you thinking of? -Nick Johnson > - Is there something in the API that I don't know about that could make > the second example faster w/o much effort? > - Is my design just bad and I should figure out a better way of doing > it? If so, how would I go about doing that? > > Alright, that's all for now. > > Thanks, > Patrick. > > -- > Patrick H. Twohig. > > Namazu Studios > P.O. Box 34161 > San Diego, CA 92163-4161 > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<google-appengine%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
