Hi Peter, I'm sure it's extremely(!) hard to host this many applications reliably and at a sane cost. I won't try to make any suggestions about that. But what concerns me is the support situation on App Engine. It shouldn't be too hard to monitor a low-bandwidth forum like this, and to provide somewhat timely feedback when downtime is being reported by some of the most senior users, during a time when you're actually rolling out changes.
I can understand that you don't want to turn this into an official support forum for all the dumb question we see here, and I can understand that you want to encourage us to purchase premium accounts. But I feel you're hurting your own business by not responding swiftly to posts like these. We've been through similar issues with our application, and the support situation is really the one thing that stops me from recommending App Engine wholeheartedly. I bet I'm not the only one. :) When discussing your strategies, it would be great if you could consider "improved responsiveness" too. Cheers, Per On Thursday, September 13, 2012 8:12:35 PM UTC+2, psm wrote: > > Jeff, > > these are good ideas and suggestions. we are working on a number of > different strategies to ameliorate these issues. some of the items you are > suggesting are already in progress, and others besides. and i agree that > this is a general philosophical challenge with PaaS. on GAE we now > regularly serve several hundreds of thousands of applications, so it is > indeed a challenge to handle the "long tail" problem. we are aware of > this, and you should expect us to be rolling out a number of things to > address it. in fact, we expect to make our experience of running this > large workload over a long period of time into an advantage with GAE. > > Peter S Magnusson > (GAE Eng Dir) > > > On Wednesday, September 12, 2012 5:35:39 PM UTC-7, Jeff Schnitzer wrote: >> >> On Wed, Sep 12, 2012 at 3:12 PM, Kaan Soral <[email protected]> wrote: >> > This is why I love App Engine, when a problem occurs instead of having >> a >> > heart attack or committing suicide, you can just wait for it to be >> resolved. >> >> Hmmm. This really unfortunately timed incident may have cost us an >> important client, so I'm not feeling the love. >> >> I have quite a lot of experience building and running large online >> systems prior to embracing GAE and my products have never had as much >> downtime as I've had over the last year. It hasn't always been >> Google's fault (the entire .st registry going down for 8+ hours really >> sucked[1]) but it usually has been. See: >> >> * Instance startup time ballooning by 3X and hitting deadlines >> (multiple occasions) >> * GAE blocking CloudFlare with an undocumented security system >> * This incident, where Java instances started mysteriously failing >> >> Would waiting have fixed these issues? I'm not convinced. Google may >> have smart people running GAE but they aren't watching _my_ app, >> they're just watching for an uptick in the number of complaints. If >> you're doing something slightly unusual (say, running a CF reverse >> proxy), you might be statistical noise. Apparently this Java problem >> _was_ widespread, but I had no way of knowing that. >> >> GAE's value proposition is that it's better to have Google's smart >> engineers building and maintaining your infrastructure. But my site >> would be more reliable if I had one dumb person (possibly me) who >> cares specifically about _my_ infrastructure. I've screwed up >> deployments and upgrades in production before, but at least I'm aware >> when changes happen, get immediate feedback, and can fix the problem >> right then and there. >> >> With GAE, the only thing I can do when my alarms go off is to whine as >> loudly as possible. But there is no feedback! I have no way of >> knowing if Google is working on the problem or if they're still >> waiting for more complaints that will never materialize. Will I be >> down for 15 minutes, 1 hour, 2 hours, 8 hours, forever? How long do >> you want to wait? >> >> This feels like a fundamental flaw in the PaaS concept, destined to >> produce multiple-hour downtimes at irregular intervals. The feedback >> loop is too slow (and lossy if the problem is not widespread). >> There's no amount of QA or testing that will prevent failures in a >> system as big as complicated as GAE. So the only reasonable option is >> to get that feedback loop shorter. How can that happen? Some ideas: >> >> * Google could announce when they are rolling out changes. I don't >> need release notes (although it would be nice to know what to watch >> for) but I'd like to know when I should pay extra attention. Or not >> schedule client demos. Facebook does something like this, rolling out >> platform changes on specific days of the week (which I long ago >> stopped caring about). >> >> * Google could make extra support channels available during this >> time. Hell, use twitter. Think of us as your QA staff - if we see >> something amiss, we'd like to let you know. >> >> * Google could be more transparent about problems as they happen. >> When you know there is an issue, let us know. Since I must assume >> that any problem which Google hasn't acknowledged is a problem Google >> doesn't know about, I can stop spamming @google.com addresses. >> >> * Google could monitor our apps, and compare error rates before >> rollout to error rates after rollout. Ideally you'd break this down >> by component; figure out which apps use the search api, so when you >> roll out changes to the search system, you're specifically watching >> for an uptick in 500 errors from those apps. Something like that. >> >> Any other ideas? I really like GAE and I really like the PaaS >> concept. But reliability is really a problem. It's probably going to >> be an even bigger problem going on into the future as GAE (hopefully) >> adds new features and gets a bigger footprint. More moving parts >> means more failures. >> >> Jeff >> >> P.S. Paying $6k/yr for Premier Support is not the answer. Whether or >> not that would solve my problem, that doesn't solve GAE's problem. >> >> [1]: >> http://blorn.com/post/29851770158/beware-cutesy-two-letter-tlds-for-your-domain-name >> >> > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/tmYWDN8r2pUJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
