I forgot to mention: the model that always seem to be part of this avalanche has many composite indexes: 63.
j On Dec 2, 2:01 pm, Jason Collins <[email protected]> wrote: > Hi Per, > > We have been seeing similar bursts of DeadlineExceededErrors (DEE) on > our application (appid: steprep), and we have been seeing this for > some time (weeks). > > The symptom: datastore put()'s that normally take 100-150ms to > complete suddenly take 30-s+ and cause a DEE. Usually, many processes > are impacted by this at the same moment, yielding an "avalanche" of > errors (I like your term!). We are not using Entity Groups and the > processes that are part of the avalanche are operating over different > entities. > > It seems like some other common resource is blocking a bunch of > requests, which all end up timing out at the same moment. > > j > > On Dec 2, 6:14 am, Per Larsson <[email protected]> wrote: > > > > > I have reported this to the issue tracker > > here:http://code.google.com/p/googleappengine/issues/detail?id=4180. > > Cross-posting to this list in case the community can help us out.... > > > Our app has fairly high traffic, about 15-20 QPS. We are experiencing > > short bursts of 500 errors that make our app completely unusuable for > > short periods of time. We're not completely sure what's going wrong, > > but this is our guess: > > > Assumption #1: App engine will never fire up more number instances than > > your current QPS. Assumption #2: One instance will never handle two > > requests concurrently (why?) Assumption #3: If your requests take more > > than one second to execute on average there will not be enough > > instances to handle them and they will fail with 500 and the log > > message "Request was aborted after waiting too long to attempt to > > service your request...", sometimes tagged with throttle_code=2 > > whatever that means. Sometimes our application experiences short spikes > > in latency for datastore writes (we even see occational > > DeadlineExceededExceptions in commits and puts). During these spikes we > > get an avalanche of 500 "Request was aborted..." errors from lots of > > different URLs. I guess of what's happening is that a few stalled > > requests locks up all our available instances. These spikes appear > > about once an hour and last for five minutes or so. Our requests > > normally run in under 1000 ms, with some headroom to spare. We have > > tried to work around the problem by optimizing our requests, but we > > can't really get away from the fact that our app needs to write to the > > datastore quite often. Attached is a screenshot from the dashboard > > showing simultaneous spikes in latency and error rates. > > > spikes.png > > 57KViewDownload -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
