I managed to bring my service up by following the steps below but it remains a mystery to me what happened. Here is the history of events in case they assist anybody in the future, or anybody has a theory and anybody at Google cares to follow up.
Context 1. According to my monitoring service and customers experience the service has been up for months, with regular updates etc It is a low traffic site but very stable and a few customers depend their day to day business on it. It is python, HDR and the processes I use are time tested for years. 2. 2013-08-08 01:58:37 I upload a new version and make it default (this is more a day since the outage but I mention it for context) Outage 3. 2013-08-09 15:58:43 Pingdom pages me reporting an outage for more than a minute. I test it myself and indeed I see 500 errors in multiple endpoints. I check Google status, it shows everything green. 4. 2013-08-09 16:55PM I post the initial message on this thread while heading to a location I can access a computer Pingdom up and down messages keep coming every few minutes, customers inquire... 5. 2013-08-09 17:23PM I open a production issue describing the issue,it is still open and critical, no one ever contacted me 6. By now I am back by a computer and internet connection and think what I can do on my own. I remember that I had missed version 1.8.3 release and my last upload was with 1.8.2. So as a last step before I post again I download 1.8.3 and rebuild my service using 1.8.3 7.2013-08-09 17:32:44 I upload the service, same version, same bits but using 1.8.3 This fixes the issue almost immediately Fortunately, I was lucky to be relatively close to a computer and internet connection, and the outage lasted only about 90 minutes, this could have been much worse…. PK http://www.gae123.com On August 9, 2013 at 3:39:13 PM, timh ([email protected]) wrote: I am not seeing any actual issues (nothing in the logs) but I have noticed all my instances are getting shut down and restarted about every 30 mins. Normally an instances are at least half a day old. Billing enabled but low traffic site. T On Saturday, August 10, 2013 7:55:06 AM UTC+8, PK wrote: For the past 40 minutes my app has been going up and down with logs showing: "a problem was encountered with the process..." This is a paid python app with a long track record of stability. I last modified source code 24 hours ago so I doubt it is a fault on our end. Anybody else with similar issues today? PK http://www.gae123.com -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/google-appengine. For more options, visit https://groups.google.com/groups/opt_out.
