I have having the same issue for 2 months now and its still happening as of writing.
On Dec 3, 4:21 am, Ikai Lan <[email protected]> wrote: > We're rolling back a change that was made earlier causing these issues. The > error rates should be dying down. We should be operating normally soon. > > -- > Ikai > > On Thu, Dec 2, 2010 at 12:05 PM, Jason Collins > <[email protected]>wrote: > > > > > I forgot to mention: the model that always seem to be part of this > > avalanche has many composite indexes: 63. > > > j > > > On Dec 2, 2:01 pm, Jason Collins <[email protected]> wrote: > > > Hi Per, > > > > We have been seeing similar bursts of DeadlineExceededErrors (DEE) on > > > our application (appid: steprep), and we have been seeing this for > > > some time (weeks). > > > > The symptom: datastore put()'s that normally take 100-150ms to > > > complete suddenly take 30-s+ and cause a DEE. Usually, many processes > > > are impacted by this at the same moment, yielding an "avalanche" of > > > errors (I like your term!). We are not using Entity Groups and the > > > processes that are part of the avalanche are operating over different > > > entities. > > > > It seems like some other common resource is blocking a bunch of > > > requests, which all end up timing out at the same moment. > > > > j > > > > On Dec 2, 6:14 am, Per Larsson <[email protected]> wrote: > > > > > I have reported this to the issue tracker here: > >http://code.google.com/p/googleappengine/issues/detail?id=4180. > > > > Cross-posting to this list in case the community can help us out.... > > > > > Our app has fairly high traffic, about 15-20 QPS. We are experiencing > > > > short bursts of 500 errors that make our app completely unusuable for > > > > short periods of time. We're not completely sure what's going wrong, > > > > but this is our guess: > > > > > Assumption #1: App engine will never fire up more number instances than > > > > your current QPS. Assumption #2: One instance will never handle two > > > > requests concurrently (why?) Assumption #3: If your requests take more > > > > than one second to execute on average there will not be enough > > > > instances to handle them and they will fail with 500 and the log > > > > message "Request was aborted after waiting too long to attempt to > > > > service your request...", sometimes tagged with throttle_code=2 > > > > whatever that means. Sometimes our application experiences short spikes > > > > in latency for datastore writes (we even see occational > > > > DeadlineExceededExceptions in commits and puts). During these spikes we > > > > get an avalanche of 500 "Request was aborted..." errors from lots of > > > > different URLs. I guess of what's happening is that a few stalled > > > > requests locks up all our available instances. These spikes appear > > > > about once an hour and last for five minutes or so. Our requests > > > > normally run in under 1000 ms, with some headroom to spare. We have > > > > tried to work around the problem by optimizing our requests, but we > > > > can't really get away from the fact that our app needs to write to the > > > > datastore quite often. Attached is a screenshot from the dashboard > > > > showing simultaneous spikes in latency and error rates. > > > > > spikes.png > > > > 57KViewDownload > > > -- > > You received this message because you are subscribed to the Google Groups > > "Google App Engine" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]<google-appengine%2Bunsubscrib > > [email protected]> > > . > > For more options, visit this group at > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
