Yes, I'm still seeing problems. We've been averaging 4 seconds per request for the past six hours, roughly correlated to the elevated latency that you say is within tolerance and is averaging 250 ms per request. Some of that is due to a recently identified performance issue with one of our common handlers being identified as a "high CPU" handler and getting sidelined to make room for more performant apps' handlers. We haven't been able to fix that yet, but it's not just that handler: it's every dynamic handler.
This isn't just an issue since last night, though; it's been on and off for two months. We're still seeing widely variable response times on almost every request. In the admin console, for example, most requests take a normal second or so, but a fifth of them or so take between 7-20 seconds to respond. This is stuff like expanding one log entry with no log messages in it. I know that's not in my code. Is it instance startup costs? I think so. From my logs, I'll load something simple like a FAQ page, and it's fine, whenever I have to start another instance and import Django and some views, well, that's 7-14 seconds to do it. Have we changed anything in the past two months? A couple very small things, but mostly our development has been on another branch. It goes between "working just fine on every request" and "several seconds to spawn an instance running the same code as usual" from day to day and time to time. It's just been getting more frequent and worse as time goes on. The simplest handler we have, which imports our models, reads one thing at random from the datastore, and returns it, with no web framework, normally runs in 50-250 ms and takes 50-80 ms of cpu, but sometimes (new instance?) takes 1300-2000 ms and 130-300 ms cpu. But I made an even simpler handler as a test, with no import, that just prints a random number. Starting a new instance for that only takes a 70-250 ms response time. And if I make that import our models every time, it doesn't change anything. It's almost like the cost of starting a new instance is multiplied by the cost of the instantiation code is multiplied by whether App Engine is feeling cheerful that hour. It could be our code, I could see that. If there's a reason why the admin console would be so slow at the same times and in the same way as the rest of our site, maybe it would shed some light. If we're doing something subtly wrong when we import Django (0.96) and our models, I could see that, but only if there's a reason why we wouldn't see it sometimes and would at other times, and why it would have been getting worse without any code changes. It would also help a lot to know exactly how this works (from the Quotas page): "Applications that are heavily cpu-bound, on the other hand, may incur some additional latency in long-running requests in order to make room for other apps sharing the same servers." Marzia has said that this is per-request, so I'm assuming that certain handlers will get sidelined, but this is happening site-wide -- otherwise, it would make perfect sense. How is a handler identified as high-cpu? How long does that classification last? Thanks, --Nick (app id: skrit) On Mar 3, 1:25 pm, Brett Slatkin <[email protected]> wrote: > Hi Nick, > > On Tue, Mar 3, 2009 at 8:55 AM, Nick Winter <[email protected]> wrote: > > >http://code.google.com/status/appengine/detail/serving/2009/03/03#ae-... > > > Just about every day for the past several weeks, there's been elevated > > latency like this, usually at similar times of day. It was unfortunate > > and frustrating before, but since last night our development is > > stalled because every part of App Engine is too slow to do any testing > > or data manipulation right now. 5 seconds per request? > > > I'm confident that the App Engine team will get a handle on the > > performance and everything will be shiny once more, but it'd be nice > > to hear some word as to what's going on. Are the servers just > > overloaded? Did something go wrong with the maintenance last night? Is > > anomaly-yellow serving to be expected? > > We had some unexpected issues during the maintenance last night which > caused elevated latencies and errors for all applications. We resolved > the issue around 8:45pm last night and things have returned to normal > since. Please let me know if you're still seeing any problems. > > As for the elevated latency for the dynamic request metric (that you > linked to), this is primarily a product of alert tolerances. We're > still tuning our status site metrics to match real-world expectations > of App Engine performance. You'll notice today that we've raised some > of these tolerances by a little bit, causing many of the lines to go > back to a blue color (i.e., everything OK). > > -Brett --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
