[google-appengine] Re: get_by_key_name vs fetch performance

ryan Wed, 10 Feb 2010 14:53:58 -0800

hi all! great discussion. thanks for the original post and
measurements, waldemar! in short, you're right, the 1.3.1 datastore
backend in production includes a number of improvements to both query
performance and fault tolerance.

for query performance, we turned on a new code path that parallelizes
internal operations and bigtable scans and lookups more aggressively,
which is likely the reason for the improvements of query fetches vs.
gets that you saw.

for fault tolerance, we're now doing more retries in the backend
automatically, usually up to the full 30s request deadline for most
calls - basically everything except transaction commits, which retries
client side instead of in the backend. (if you're using python, you
might now want to try db.run_in_transaction_custom_retries() with a
high number of retries, e.g. 10, instead of just
db.run_in_transaction(). similar java support should be coming soon.)

we'll mention more detail in the official release notes and blog post,
but based on a day or so of results so far, we're already seeing a
substantial drop in error rate, mostly due to reduced timeouts, across
the board. we're also seeing that error rate is much less spiky, wihch
is always good.

On Feb 10, 8:47 am, Waldemar Kornewald <[email protected]> wrote:
> Hi,
> were there any optimizations to the datastore lately? We did a few
> Model.get_by_key_name vs Query.fetch() benchmarks (code is attached)
> and it looks like the difference is minimal for individual
> gets/fetches and practically non-existent for batch-gets vs
> batch-fetch for the same entities.
>
> Here we do 1000 individual get()s:http://kornewald.appspot.com/get
>
> Here we do 1000 individual fetch()es for the same 
> entities:http://kornewald.appspot.com/fetch
>
> Here we do four batch-get()s of 250 entities 
> each:http://kornewald.appspot.com/batchget
>
> Here we do four batch-fetch()es for 250 entities 
> each:http://kornewald.appspot.com/batchfetch
>
> The number returned is the time needed for retrieving the entities, so
> the first two basically show the time per single get()/fetch().
>
> Is there anything wrong with the benchmark code?
>
> Our previous benchmarks showed a much more significant difference (3x
> slower fetch()). Now it's merely a 30% difference and the few
> milliseconds can hardly be noticed by the end-user.
>
> Can we stop designing models like crazy around key names because there
> is hardly any benefit in the added complexity or inconvenience in most
> cases (e.g., not being able to change the key name afterwards)?
>
> It looks like the only case where batch-get()s are useful is when you
> can't formulate a single fetch() for the same kind of query.
>
> Bye,
> Waldemar
>
>  guestbook.zip
> 2KViewDownload

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: get_by_key_name vs fetch performance

Reply via email to