Some additional observations and questions... After reading this [Link 1] stack overflow article that mentioned an issue with having your Max Idle count below 6, we started looking at our warmup request on our staging environment because that app-id has Idle Instances set to Auto-Auto, while production had specific values.
But...Where did all the "/_ah/warmup" requests go? When doing a label search for these staging environment logs ["path:/_ah/warmup" (doing a label search)] we couldn't find any warmup request!!(yes, we have warmup requests turned on)...we would just see the first cold-start request would take around 15 seconds to load (F1) and 10 seconds to load on (F2). I even shut down every instance and hit the staging server again to see if I could find a warmup request in the logs...nope. Honestly, I would rather have a user wait 10 seconds for the first request to that server as opposed risking the warmup requests failing again. Where did all the "/_ah/warmup" requests go? More importantly, why would we have such different times for warmup requests compared to cold starts? Shouldn't they be nearly identical?! Rock on, -Hardwick [Link 1] - http://stackoverflow.com/questions/9422698/ah-warmup-producing-harddeadlineexceedederror On Jul 12, 12:26 pm, David Hardwick <[email protected]> wrote: > Hello, > > I realize there's been a lot of discussion on startup times exceeded on > this forum recently, but wanted needed to post this experience we had this > morning to keep the attention on this important issue. > > We uploaded a point release of our app to a "not-live" version this morning > and, of course, we were going to click around on that instance to make sure > it's all kosher before making that version "live." The warm-up requests > for the "not-live" version were exceeding the deadline limit of 60s... > __and__we__are__on__F4s__!_!. > > However, the LIVE version of the app crashed too, 500 server errors, > instance counts went to zero, all sorts of whacky stuff was seen in the > control panel. All that happened to our LIVE version without when all we > did was upload another "non-live" version and hit it with a single > request...did I mention we were on F4s? ;-) Does the failure of any > instance to exceed the 60s limit take down all instances to include live > one? > > We did a few things as quickly as possible since our live application was > down, so clearly we didn't have the time to take the scientific approach of > only changing one thing at a time and wait to see if it that did it. > > We... > 1. Switched from F4s to F2 (i figured if this would least get us on some > new servers/instances) > 2. Increased max idle instances from 1 to 2 (with F4s running, I'm fine > with having just 1 idle instance and not at all happy about paying for 2 > idle instances, so maybe we'll just increase this prior to deployments and > then back down again after the deployment succeeds until we know more) > 3. Made the recently uploaded version live (hey, why not, the production > app was down for 10 minutes, so how much more harm could we do?) > > We use GWT and Guice, we jar everything (as I have been paying attention to > this startup time discussions for quite some time now. We are also > considering switching our Guice libraries to a non-AOP version as we saw > suggested in another blog since we just need the injection. > > Any insight, and I'm all ears! app_id=s~myflashpanel > > Regards, > -Hardwick > > -- > > *We make Google Apps even better.* > > *David Hardwick* > *CTO* > [email protected] > > *Signature by Flashpanel <http://flashpanel.com/>* > *See us in Mashable: Growing Up Google: How Cloud Computing Is Changing a > Generation <http://mashable.com/2012/04/30/generation-growing-up-google/>* -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
