Ah, thanks Nick! I actually started to implement some of those changes, but ended up getting sidetracked with other things, but I'm starting again on it. Will probably have more questions later :)
On Tue, Apr 20, 2010 at 4:23 AM, Nick Johnson (Google) < [email protected]> wrote: > Hi Patrick, > > Good questions! > > On Tue, Apr 20, 2010 at 12:57 AM, Patrick Twohig < > [email protected]> wrote: > >> Hi All, >> >> As I understand it, the process of performing a single fetch (call to >> get()) from the dastastore using a key basically involves finding the host >> housing the entity, opening a socket, fetching the data, and then cleaning >> up the connection. So to fetch something like 30 entities from the >> datastore, you're repeating the process 30 times over in serial, each time >> incurring whatever overhead is involved. I also read that if you perform >> bulk fetches, (ie passing multiple keys at once) you can eliminate a great >> deal of that overhead. In one of the videos I watched from Google I/0 2009, >> the presenter (whose name I forget - d'oh) said that performing a bulk fetch >> actually performs the fetches in parallel from the data store and you shoudl >> see requests noticeably faster. >> >> Currently I have a few situations where the app performs many fetches from >> the data store in serially, rather than in bulk, and I believe it is the >> result of these requests being extremely slow and CPU intensive. Where >> possible, I put into place as much bulk fetches as I can but I'm a little >> stuck in a few places. >> >> I'm basing the fetch latency on today's numbers -- >> http://code.google.com/status/appengine/detail/datastore/2010/04/19. >> Anomalies aside, It looks like the get latency somewhere between 80ms and >> 160ms, let's spit difference and just say that it's 120 milliseconds. >> Additionally, the query latency is somewhere between 250ms and 500ms. >> Splitting the difference, that's 375ms. I'm just going to use those numbers >> as a ballpark estimate for fetching multiple entities from the data store, >> feel free to correct me if any of my reasoning is flawed or incorrect. >> > > The figures shown by the status site seem to be on the high side at the > moment - they represent worst cases. In my own apps, gets are observed to be > more on the order of 10-20ms, while queries vary widely depending on > returned data, but average about 100-300ms. > > >> Example 1: http://imagepaste.nullnetwork.net/viewimage.php?id=830 >> >> Given the above example, I'm assuming that if I performed an ancestor >> query with Foo("A") as the ancestor it would effectively bulk-fetch the >> entire entity group. I could then use the result of that query to get the >> data I need. That would make the fetch from the datastore one query, 375 >> milliseconds versus (7entities * 160ms) or 1120ms. So long as you need 3 >> or more entities (3 * 160) it would stand to reason that you're just better >> off just fetching the whole thing. In some simple tests I did, that seemed >> to be the case, the query approach was faster, and that's great if you know >> everything is in the same entity group. >> >> Example 2: http://imagepaste.nullnetwork.net/viewimage.php?id=831 >> >> Given the above example, none of the entities are in the same entity >> group, but I would want to try to perform bulk fetches wherever possible. I >> would first fetch Foo("A"). I would then see that it has two key properties >> pointing to Bar("B") and Bar("C"), perform a fetch of those two entities at >> once. Finally, I would see that Bar("B") and Bar("C") each reference two >> more entities -- Baz("D"), Baz("E"), Baz("F"), and Baz("G") for a total of >> four. In the worst case, I would fetch each entity individually taking, >> once again, 1120ms. In the best case and I perform 3 fetches, (fetch A >> first, then fetch B and C, then lastly fetch D, E, F, and G), it would be >> more in the neighborhood of 480 milliseconds. It's still an improvement >> over fetching each entity individually, but not much. >> > > Very similar to this is the 'referenceproperty prefetching' pattern - see > http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine > > <http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine> > > >> >> So I was thinking of ways to improve this, the second example in >> particular, because I have a few places in my app where that exact thing is >> happening. Right now it's actually implemented with individual fetches, but >> it backed by memcache in many circumstances so that definitely helps. >> >> So given that, here's my questions... >> >> - When serializing the objects, would it be worthwhile adding some >> sort of metadata in the entity that would tell me what other entities it >> references (either directly or indirectly) so that I could fetch the whole >> thing with one or two API calls? I was thinking that an entity could have >> child entities with all the keys it references directly or indirectly. >> This >> would be a huge pain to implement, and I'm not sure it would make a >> noticeable performance boost. >> >> > Certainly, if you experience serial gets as a significant problem that > isn't solved with simple prefetching, this could be worth doing. I would > avoid using child entities, however, and simply have a list of keys instead. > > >> - Is there something "under the covers" of the API that actually makes >> more efficient usage of resources that I don't know about? >> >> > There's a lot 'under the covers' - what specifically are you thinking of? > > -Nick Johnson > > >> - Is there something in the API that I don't know about that could >> make the second example faster w/o much effort? >> - Is my design just bad and I should figure out a better way of doing >> it? If so, how would I go about doing that? >> >> Alright, that's all for now. >> >> Thanks, >> Patrick. >> >> -- >> Patrick H. Twohig. >> >> Namazu Studios >> P.O. Box 34161 >> San Diego, CA 92163-4161 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<google-appengine%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> > > > > -- > Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. > :: Registered in Dublin, Ireland, Registration Number: 368047 > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: > 368047 > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<google-appengine%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- Patrick H. Twohig. Namazu Studios P.O. Box 34161 San Diego, CA 92163-4161 -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
