I have having the same issue for 2 months now and its still happening
as of writing.

On Dec 3, 4:21 am, Ikai Lan <[email protected]> wrote:
> We're rolling back a change that was made earlier causing these issues. The
> error rates should be dying down. We should be operating normally soon.
>
> --
> Ikai
>
> On Thu, Dec 2, 2010 at 12:05 PM, Jason Collins 
> <[email protected]>wrote:
>
>
>
> > I forgot to mention: the model that always seem to be part of this
> > avalanche has many composite indexes: 63.
>
> > j
>
> > On Dec 2, 2:01 pm, Jason Collins <[email protected]> wrote:
> > > Hi Per,
>
> > > We have been seeing similar bursts of DeadlineExceededErrors (DEE) on
> > > our application (appid: steprep), and we have been seeing this for
> > > some time (weeks).
>
> > > The symptom: datastore put()'s that normally take 100-150ms to
> > > complete suddenly take 30-s+ and cause a DEE. Usually, many processes
> > > are impacted by this at the same moment, yielding an "avalanche" of
> > > errors (I like your term!). We are not using Entity Groups and the
> > > processes that are part of the avalanche are operating over different
> > > entities.
>
> > > It seems like some other common resource is blocking a bunch of
> > > requests, which all end up timing out at the same moment.
>
> > > j
>
> > > On Dec 2, 6:14 am, Per Larsson <[email protected]> wrote:
>
> > > > I have reported this to the issue tracker here:
> >http://code.google.com/p/googleappengine/issues/detail?id=4180.
> > > > Cross-posting to this list in case the community can help us out....
>
> > > > Our app has fairly high traffic, about 15-20 QPS. We are experiencing
> > > > short bursts of 500 errors that make our app completely unusuable for
> > > > short periods of time. We're not completely sure what's going wrong,
> > > > but this is our guess:
>
> > > > Assumption #1: App engine will never fire up more number instances than
> > > > your current QPS. Assumption #2: One instance will never handle two
> > > > requests concurrently (why?) Assumption #3: If your requests take more
> > > > than one second to execute on average there will not be enough
> > > > instances to handle them and they will fail with 500 and the log
> > > > message "Request was aborted after waiting too long to attempt to
> > > > service your request...", sometimes tagged with throttle_code=2
> > > > whatever that means. Sometimes our application experiences short spikes
> > > > in latency for datastore writes (we even see occational
> > > > DeadlineExceededExceptions in commits and puts). During these spikes we
> > > > get an avalanche of 500 "Request was aborted..." errors from lots of
> > > > different URLs. I guess of what's happening is that a few stalled
> > > > requests locks up all our available instances. These spikes appear
> > > > about once an hour and last for five minutes or so. Our requests
> > > > normally run in under 1000 ms, with some headroom to spare. We have
> > > > tried to work around the problem by optimizing our requests, but we
> > > > can't really get away from the fact that our app needs to write to the
> > > > datastore quite often. Attached is a screenshot from the dashboard
> > > > showing simultaneous spikes in latency and error rates.
>
> > > >  spikes.png
> > > > 57KViewDownload
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected]<google-appengine%2Bunsubscrib 
> > [email protected]>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to