Hi Per,

We have been seeing similar bursts of DeadlineExceededErrors (DEE) on
our application (appid: steprep), and we have been seeing this for
some time (weeks).

The symptom: datastore put()'s that normally take 100-150ms to
complete suddenly take 30-s+ and cause a DEE. Usually, many processes
are impacted by this at the same moment, yielding an "avalanche" of
errors (I like your term!). We are not using Entity Groups and the
processes that are part of the avalanche are operating over different
entities.

It seems like some other common resource is blocking a bunch of
requests, which all end up timing out at the same moment.

j

On Dec 2, 6:14 am, Per Larsson <[email protected]> wrote:
> I have reported this to the issue tracker 
> here:http://code.google.com/p/googleappengine/issues/detail?id=4180.
> Cross-posting to this list in case the community can help us out....
>
> Our app has fairly high traffic, about 15-20 QPS. We are experiencing
> short bursts of 500 errors that make our app completely unusuable for
> short periods of time. We're not completely sure what's going wrong,
> but this is our guess:
>
> Assumption #1: App engine will never fire up more number instances than
> your current QPS. Assumption #2: One instance will never handle two
> requests concurrently (why?) Assumption #3: If your requests take more
> than one second to execute on average there will not be enough
> instances to handle them and they will fail with 500 and the log
> message "Request was aborted after waiting too long to attempt to
> service your request...", sometimes tagged with throttle_code=2
> whatever that means. Sometimes our application experiences short spikes
> in latency for datastore writes (we even see occational
> DeadlineExceededExceptions in commits and puts). During these spikes we
> get an avalanche of 500 "Request was aborted..." errors from lots of
> different URLs. I guess of what's happening is that a few stalled
> requests locks up all our available instances. These spikes appear
> about once an hour and last for five minutes or so. Our requests
> normally run in under 1000 ms, with some headroom to spare. We have
> tried to work around the problem by optimizing our requests, but we
> can't really get away from the fact that our app needs to write to the
> datastore quite often. Attached is a screenshot from the dashboard
> showing simultaneous spikes in latency and error rates.
>
>  spikes.png
> 57KViewDownload

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to