Ah, thanks Nick!  I actually started to implement some of those changes, but
ended up getting sidetracked with other things, but I'm starting again on
it.  Will probably have more questions later :)

On Tue, Apr 20, 2010 at 4:23 AM, Nick Johnson (Google) <
[email protected]> wrote:

> Hi Patrick,
>
> Good questions!
>
> On Tue, Apr 20, 2010 at 12:57 AM, Patrick Twohig <
> [email protected]> wrote:
>
>> Hi All,
>>
>> As I understand it, the process of performing a single fetch (call to
>> get())  from the dastastore using a key basically involves finding the host
>> housing the entity, opening a socket, fetching the data, and then cleaning
>> up the connection.  So to fetch something like 30 entities from the
>> datastore, you're repeating the process 30 times over in serial, each time
>> incurring whatever overhead is involved.  I also read that if you perform
>> bulk fetches, (ie passing multiple keys at once) you can eliminate a great
>> deal of that overhead.  In one of the videos I watched from Google I/0 2009,
>> the presenter (whose name I forget - d'oh) said that performing a bulk fetch
>> actually performs the fetches in parallel from the data store and you shoudl
>> see requests noticeably faster.
>>
>> Currently I have a few situations where the app performs many fetches from
>> the data store in serially, rather than in bulk, and I believe it is the
>> result of these requests being extremely slow and CPU intensive.  Where
>> possible, I put into place as much bulk fetches as I can but I'm a little
>> stuck in a few places.
>>
>> I'm basing the fetch latency on today's numbers --
>> http://code.google.com/status/appengine/detail/datastore/2010/04/19.
>> Anomalies aside,  It looks like the get latency somewhere between 80ms and
>> 160ms, let's spit difference and just say that it's 120 milliseconds.
>> Additionally, the query latency is somewhere between 250ms and 500ms.
>> Splitting the difference, that's 375ms.  I'm just going to use those numbers
>> as a ballpark estimate for fetching multiple entities from the data store,
>> feel free to correct me if any of my reasoning is flawed or incorrect.
>>
>
> The figures shown by the status site seem to be on the high side at the
> moment - they represent worst cases. In my own apps, gets are observed to be
> more on the order of 10-20ms, while queries vary widely depending on
> returned data, but average about 100-300ms.
>
>
>> Example 1: http://imagepaste.nullnetwork.net/viewimage.php?id=830
>>
>> Given the above example, I'm assuming that if I performed an ancestor
>> query with Foo("A") as the ancestor it would effectively bulk-fetch the
>> entire entity group.  I could then use the result of that query to get the
>> data I need.  That would make the fetch from the datastore one query, 375
>> milliseconds versus (7entities * 160ms) or 1120ms.  So long as you need  3
>> or more entities (3 * 160) it would stand to reason that you're just better
>> off just fetching the whole thing.  In some simple tests I did, that seemed
>> to be the case, the query approach was faster, and that's great if you know
>> everything is in the same entity group.
>>
>> Example 2:  http://imagepaste.nullnetwork.net/viewimage.php?id=831
>>
>> Given the above example, none of the entities are in the same entity
>> group, but I would want to try to perform bulk fetches wherever possible.  I
>> would first fetch Foo("A").  I would then see that it has two key properties
>> pointing to Bar("B") and Bar("C"), perform a fetch of those two entities at
>> once.  Finally, I would see that Bar("B") and Bar("C") each reference two
>> more entities -- Baz("D"), Baz("E"), Baz("F"), and Baz("G") for a total of
>> four.  In the worst case, I would fetch each entity individually taking,
>> once again, 1120ms.  In the best case and I perform 3 fetches, (fetch A
>> first, then fetch B and C, then lastly fetch D, E, F, and G), it would be
>> more in the neighborhood of 480 milliseconds.  It's still an improvement
>> over fetching each entity individually, but not much.
>>
>
> Very similar to this is the 'referenceproperty prefetching' pattern - see
> http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine
>
> <http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine>
>
>
>>
>> So I was thinking of ways to improve this, the second example in
>> particular, because I have a few places in my app where that exact thing is
>> happening.  Right now it's actually implemented with individual fetches, but
>> it backed by memcache in many circumstances so that definitely helps.
>>
>> So given that, here's my questions...
>>
>>    - When serializing the objects, would it be worthwhile adding some
>>    sort of metadata in the entity that would tell me what other entities it
>>    references (either directly or indirectly) so that I could fetch the whole
>>    thing with one or two API calls?  I was thinking that an entity could have
>>    child entities with all the keys it references directly or indirectly.  
>> This
>>    would be a huge pain to implement, and I'm not sure it would make a
>>    noticeable performance boost.
>>
>>
> Certainly, if you experience serial gets as a significant problem that
> isn't solved with simple prefetching, this could be worth doing. I would
> avoid using child entities, however, and simply have a list of keys instead.
>
>
>>    - Is there something "under the covers" of the API that actually makes
>>    more efficient usage of resources that I don't know about?
>>
>>
> There's a lot 'under the covers' - what specifically are you thinking of?
>
> -Nick Johnson
>
>
>>    - Is there something in the API that I don't know about that could
>>    make the second example faster w/o much effort?
>>    - Is my design just bad and I should figure out a better way of doing
>>    it?  If so, how would I go about doing that?
>>
>> Alright, that's all for now.
>>
>> Thanks,
>> Patrick.
>>
>> --
>> Patrick H. Twohig.
>>
>> Namazu Studios
>> P.O. Box 34161
>> San Diego, CA 92163-4161
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<google-appengine%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>
>
> --
> Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd.
> :: Registered in Dublin, Ireland, Registration Number: 368047
> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> 368047
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
Patrick H. Twohig.

Namazu Studios
P.O. Box 34161
San Diego, CA 92163-4161

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to