point taken. i agree.
On Thu, Sep 13, 2012 at 12:11 PM, Per <[email protected]> wrote: > Hi Peter, > > I'm sure it's extremely(!) hard to host this many applications reliably > and at a sane cost. I won't try to make any suggestions about that. But > what concerns me is the support situation on App Engine. It shouldn't be > too hard to monitor a low-bandwidth forum like this, and to provide > somewhat timely feedback when downtime is being reported by some of the > most senior users, during a time when you're actually rolling out changes. > > I can understand that you don't want to turn this into an official support > forum for all the dumb question we see here, and I can understand that you > want to encourage us to purchase premium accounts. But I feel you're > hurting your own business by not responding swiftly to posts like these. > We've been through similar issues with our application, and the support > situation is really the one thing that stops me from recommending App > Engine wholeheartedly. I bet I'm not the only one. :) When discussing your > strategies, it would be great if you could consider "improved > responsiveness" too. > > Cheers, > Per > > > > > > On Thursday, September 13, 2012 8:12:35 PM UTC+2, psm wrote: >> >> Jeff, >> >> these are good ideas and suggestions. we are working on a number of >> different strategies to ameliorate these issues. some of the items you are >> suggesting are already in progress, and others besides. and i agree that >> this is a general philosophical challenge with PaaS. on GAE we now >> regularly serve several hundreds of thousands of applications, so it is >> indeed a challenge to handle the "long tail" problem. we are aware of >> this, and you should expect us to be rolling out a number of things to >> address it. in fact, we expect to make our experience of running this >> large workload over a long period of time into an advantage with GAE. >> >> Peter S Magnusson >> (GAE Eng Dir) >> >> >> On Wednesday, September 12, 2012 5:35:39 PM UTC-7, Jeff Schnitzer wrote: >>> >>> On Wed, Sep 12, 2012 at 3:12 PM, Kaan Soral <[email protected]> wrote: >>> > This is why I love App Engine, when a problem occurs instead of having >>> a >>> > heart attack or committing suicide, you can just wait for it to be >>> resolved. >>> >>> Hmmm. This really unfortunately timed incident may have cost us an >>> important client, so I'm not feeling the love. >>> >>> I have quite a lot of experience building and running large online >>> systems prior to embracing GAE and my products have never had as much >>> downtime as I've had over the last year. It hasn't always been >>> Google's fault (the entire .st registry going down for 8+ hours really >>> sucked[1]) but it usually has been. See: >>> >>> * Instance startup time ballooning by 3X and hitting deadlines >>> (multiple occasions) >>> * GAE blocking CloudFlare with an undocumented security system >>> * This incident, where Java instances started mysteriously failing >>> >>> Would waiting have fixed these issues? I'm not convinced. Google may >>> have smart people running GAE but they aren't watching _my_ app, >>> they're just watching for an uptick in the number of complaints. If >>> you're doing something slightly unusual (say, running a CF reverse >>> proxy), you might be statistical noise. Apparently this Java problem >>> _was_ widespread, but I had no way of knowing that. >>> >>> GAE's value proposition is that it's better to have Google's smart >>> engineers building and maintaining your infrastructure. But my site >>> would be more reliable if I had one dumb person (possibly me) who >>> cares specifically about _my_ infrastructure. I've screwed up >>> deployments and upgrades in production before, but at least I'm aware >>> when changes happen, get immediate feedback, and can fix the problem >>> right then and there. >>> >>> With GAE, the only thing I can do when my alarms go off is to whine as >>> loudly as possible. But there is no feedback! I have no way of >>> knowing if Google is working on the problem or if they're still >>> waiting for more complaints that will never materialize. Will I be >>> down for 15 minutes, 1 hour, 2 hours, 8 hours, forever? How long do >>> you want to wait? >>> >>> This feels like a fundamental flaw in the PaaS concept, destined to >>> produce multiple-hour downtimes at irregular intervals. The feedback >>> loop is too slow (and lossy if the problem is not widespread). >>> There's no amount of QA or testing that will prevent failures in a >>> system as big as complicated as GAE. So the only reasonable option is >>> to get that feedback loop shorter. How can that happen? Some ideas: >>> >>> * Google could announce when they are rolling out changes. I don't >>> need release notes (although it would be nice to know what to watch >>> for) but I'd like to know when I should pay extra attention. Or not >>> schedule client demos. Facebook does something like this, rolling out >>> platform changes on specific days of the week (which I long ago >>> stopped caring about). >>> >>> * Google could make extra support channels available during this >>> time. Hell, use twitter. Think of us as your QA staff - if we see >>> something amiss, we'd like to let you know. >>> >>> * Google could be more transparent about problems as they happen. >>> When you know there is an issue, let us know. Since I must assume >>> that any problem which Google hasn't acknowledged is a problem Google >>> doesn't know about, I can stop spamming @google.com addresses. >>> >>> * Google could monitor our apps, and compare error rates before >>> rollout to error rates after rollout. Ideally you'd break this down >>> by component; figure out which apps use the search api, so when you >>> roll out changes to the search system, you're specifically watching >>> for an uptick in 500 errors from those apps. Something like that. >>> >>> Any other ideas? I really like GAE and I really like the PaaS >>> concept. But reliability is really a problem. It's probably going to >>> be an even bigger problem going on into the future as GAE (hopefully) >>> adds new features and gets a bigger footprint. More moving parts >>> means more failures. >>> >>> Jeff >>> >>> P.S. Paying $6k/yr for Premier Support is not the answer. Whether or >>> not that would solve my problem, that doesn't solve GAE's problem. >>> >>> [1]: http://blorn.com/post/**29851770158/beware-cutesy-two-** >>> letter-tlds-for-your-domain-**name<http://blorn.com/post/29851770158/beware-cutesy-two-letter-tlds-for-your-domain-name> >>> >> -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/google-appengine/-/tmYWDN8r2pUJ. > > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
