[google-appengine] Re: High Latency Issues

Jason C Tue, 12 Oct 2010 08:14:02 -0700

We (appid steprep) are starting to see bursts of 30-s and 10-s
timeouts occur again. E.g.,


10-12 07:59AM 45.772 /_ah/queue/deferred 500 10012ms
10-12 07:59AM 43.185 /_ah/queue/deferred 500 10202ms
10-12 07:59AM 16.844 /_ah/queue/deferred 500 11799ms
10-12 07:57AM 44.693 /_ah/queue/deferred 500 29328ms
10-12 07:57AM 40.662 /_ah/queue/deferred 500 29515ms
10-12 07:57AM 29.826 /_ah/queue/deferred 500 35161ms
10-12 07:57AM 34.542 /_ah/queue/deferred 500 29803ms
10-12 07:57AM 28.881 /_ah/queue/deferred 500 33018ms
10-12 07:57AM 29.604 /_ah/queue/deferred 500 29740ms
10-12 07:57AM 29.205 /_ah/queue/deferred 500 29261ms
10-12 07:57AM 26.146 /_ah/queue/deferred 500 30481ms
10-12 07:57AM 25.589 /_ah/queue/deferred 500 29358ms
10-12 07:57AM 23.584 /_ah/queue/deferred 500 29611ms
10-12 07:57AM 41.243 /mgmt/alerts/trigger/ 500 10005ms
10-12 07:57AM 04.023 /_ah/queue/deferred 500 35939ms
10-12 07:57AM 05.307 /_ah/queue/deferred 500 29630ms
10-12 07:56AM 54.670 /_ah/queue/deferred 500 29673ms
10-12 07:56AM 52.672 /_ah/queue/deferred 500 29831ms
10-12 07:56AM 52.380 /_ah/queue/deferred 500 29921ms
10-12 07:56AM 52.379 /_ah/queue/deferred 500 29452ms
10-12 07:56AM 30.482 /_ah/queue/deferred 500 29534ms
10-12 07:56AM 28.382 /_ah/queue/deferred 500 29502ms
10-12 07:56AM 19.713 /_ah/queue/deferred 500 37933ms
10-12 07:56AM 24.241 /_ah/queue/deferred 500 29304ms
10-12 07:56AM 22.052 /_ah/queue/deferred 500 29793ms
10-12 07:56AM 18.893 /_ah/queue/deferred 500 31390ms
10-12 07:56AM 19.341 /_ah/queue/deferred 500 30034ms
10-12 07:56AM 18.722 /_ah/queue/deferred 500 29845ms
10-12 07:56AM 18.695 /_ah/queue/deferred 500 29467ms
10-12 07:56AM 33.521 /exec-reports/queue/increment-all-time-data-
counter 500 10027ms
10-12 07:54AM 12.524 /_ah/queue/deferred 500 29353ms
10-12 07:54AM 11.365 /_ah/queue/deferred 500 29925ms

and a little bit earlier:

10-12 07:48AM 42.456 /_ah/queue/deferred 500 10008ms
10-12 07:48AM 41.223 /_ah/queue/deferred 500 10007ms
10-12 07:48AM 38.840 /_ah/queue/deferred 500 10058ms
10-12 07:48AM 37.868 /_ah/queue/deferred 500 10033ms
10-12 07:48AM 37.283 /_ah/queue/deferred 500 10006ms
10-12 07:48AM 36.481 /_ah/queue/deferred 500 10008ms
10-12 07:48AM 36.218 /_ah/queue/deferred 500 10001ms
10-12 07:48AM 36.184 /mgmt/jobs/put/ 500 10027ms
10-12 07:48AM 35.902 /_ah/queue/deferred 500 10001ms
10-12 07:48AM 35.888 /_ah/queue/deferred 500 10002ms
10-12 07:48AM 35.792 /exec-reports/queue/increment-all-time-data-
counter 500 10007ms
10-12 07:48AM 34.955 /_ah/queue/deferred 500 10074ms
10-12 07:48AM 34.678 /_ah/queue/deferred 500 10007ms
10-12 07:48AM 33.743 /exec-reports/queue/increment-all-time-data-
counter 500 10005ms
10-12 07:48AM 31.323 /mgmt/jobs/put/deferred/?jobId=f54cddcf-05db-4cdc-
a2ec-3859f8cb970e 500 10045ms
10-12 07:48AM 30.146 /_ah/queue/deferred 500 10001ms
10-12 07:48AM 30.106 /_ah/queue/deferred 500 10039ms
10-12 07:48AM 29.551 /_ah/queue/deferred 500 10013ms
10-12 07:48AM 29.332 /mgmt/jobs/put/deferred/?jobId=1135b765-3199-4961-
a232-1ee46c235058 500 10036ms
10-12 07:48AM 27.608 /_ah/queue/deferred 500 10015ms
10-12 07:48AM 27.566 /mgmt/jobs/put/deferred/?
jobId=26addd00-9958-4747-9f48-b8f5acc61103 500 10012ms
10-12 07:48AM 27.172 /mgmt/jobs/put/deferred/?
jobId=8b69f017-1dc9-4b0f-8491-c3154a2b5dbb 500 10117ms
10-12 07:48AM 27.154 /mgmt/jobs/put/deferred/?jobId=5f387ec5-142a-43fc-
be04-5d911d34c528 500 10001ms
10-12 07:48AM 26.930 /exec-reports/queue/increment-daily-data-counter
500 10024ms
10-12 07:48AM 26.942 /exec-reports/queue/increment-all-time-data-
counter 500 10009ms
10-12 07:48AM 26.869 /mgmt/jobs/put/deferred/?
jobId=aa4d1923-0ce9-4ad5-88ef-7c835ba3db14 500 10031ms
10-12 07:48AM 26.377 /_ah/queue/deferred 500 10001ms
10-12 07:48AM 26.376 /exec-reports/queue/increment-all-time-data-
counter 500 10001ms
10-12 07:48AM 26.357 /_ah/queue/deferred 500 10020ms
10-12 07:48AM 26.347 /mgmt/jobs/put/deferred/?jobId=27b61fd1-
b980-4a8e-9b26-cb517a2fbe2a 500 10006ms
10-12 07:48AM 26.345 /_ah/queue/deferred 500 10006ms
10-12 07:48AM 26.342 /exec-reports/queue/increment-all-time-data-
counter 500 10008ms

There are all tasks that run quickly (i.e., much less than 30-s) under
"normal" operation. I can find more example if desired.

j

On Oct 11, 11:04 am, Jeff Schwartz <[email protected]> wrote:
> I am only guessing but I imagine that 'cluster' is some form of load
> balancing segment that Google uses to service and route requests to the
> virtual servers and this error/warning is indicative of when that queue is
> very busy. And regardless of if it is an error or a mere warning it still
> resulted in a 500, extremely high resource usage and a negative experience
> for the end user.
>
>
>
>
>
> On Mon, Oct 11, 2010 at 12:56 PM, Jamie H <[email protected]> wrote:
> > What do you mean by "cold instance startup that timed out" ?
>
> > Sometimes I will get the error:
> > Request was aborted after waiting too long to attempt to service your
> > request. This may happen sporadically when the App Engine serving
> > cluster is under unexpectedly high or uneven load. If you see this
> > message frequently, please contact the App Engine team.
>
> > I see this error occasionally when my site has NO load, so I assume
> > this is a cold instance startup that timed out?
>
> > I've been trying to find out about this issue for a while with little
> > response.  It is fairly troubling to see this error randomly in
> > several apps of mine.  I never noticed it until I realized it is a
> > "warning" not an "error" so you must set your log severity
> > appropriately to find these.  People have told me this error occurs
> > when all your apps instances are busy and requests wait too long in
> > queue for an instance.  However, I see it when my site has NO load so
> > I don't quite believe this is always the case...
>
> > On Oct 11, 11:30 am, Eli Jones <[email protected]> wrote:
> > > Seems like if you want direct one on one comunication and service
> > > guarantees, you gotta try to get into the Appengine for Business preview:
>
> > >http://code.google.com/appengine/business/
>
> > > <http://code.google.com/appengine/business/>But then.. you have to pay.
>
> > > Also, from seeing your logs below.. it really appears that the 8:07:46
> > log
> > > was for a cold instance startup that timed out, the 8:07:59 log is for a
> > > cold start that succeeded, and the 8:08:05 log is for a hot instance.
>
> > > Are you sure that log number 2 (for 8:07:59) didn't have this additional
> > > info when you expand it?:
>
> > > "This request caused a new process to be started for your application,
> > and
> > > thus caused your application code to be loaded for the first time. This
> > > request may thus take longer and use more CPU than a typical request for
> > > your application."
>
> > > On Mon, Oct 11, 2010 at 12:19 PM, Jeff Schwartz <[email protected]
> > >wrote:
>
> > > > Here is an example of my logs:
> > > > 1) 10-11 08:08AM 05.539 /account.groovy 200 218ms 265cpu_ms
> > 13api_cpu_ms
> > > > 2kb Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.10)
> > > > Gecko/20100914 Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C),gzip(gfe)
> > 2)
> > > > 10-11 08:07AM 59.775 /account.groovy 200 2401ms 1976cpu_ms 13api_cpu_ms
> > > > 2kb Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.10)
> > > > Gecko/20100914 Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C),gzip(gfe)
> > 3)
> > > > 10-11 08:07AM 46.856 /account.groovy 500 10015ms 0cpu_ms 0kb
> > Mozilla/5.0
> > > > (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.10) Gecko/20100914
> > > > Firefox/3.6.10 ( .NET CLR 3.5.30729; .NET4.0C),gzip(gfe) All of the
> > above
> > > > are for the exact same code operating on the exact same data.
>
> > > > Excluding #3 above, the server was already live and ready to receive
> > > > requests. Notice the variance in response time for #1 & #2. That is
> > alarming
> > > > don't you think?
>
> > > > This is not indicative of a stable environment where one should be
> > billed
> > > > for using it. Once issues like these are resolved then billing would be
> > > > entirely acceptable.
>
> > > > Again, I don't believe I am being overly critical but I have yet to
> > receive
> > > > a response from anyone at Google regarding my plea for assistance. If
> > they
> > > > would prefer to contact me off-line via email then that would be wholly
> > > > agreeable on my part.
>
> > > > Jeff
>
> > > >  --
> > > > You received this message because you are subscribed to the Google
> > Groups
> > > > "Google App Engine" group.
> > > > To post to this group, send email to [email protected]
> > .
> > > > To unsubscribe from this group, send email to
> > > > [email protected]<google-appengine%2Bunsubscrib
> > > >  [email protected]>
> > <google-appengine%[email protected]<google-appengine%252Bunsub 
> > [email protected]>
>
> > > > .
> > > > For more options, visit this group at
> > > >http://groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected]<google-appengine%2Bunsubscrib 
> > [email protected]>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: High Latency Issues

Reply via email to