Thank you Gregory, You say it affected a small portion of users and you later remove issue notification from GAE status page which makes your status history and availability counter look better as well for the new customers comming to GAE and checking status page. Is this honest?
you never know, maybe next time your app will be within that small portion of incidents.. On Nov 19, 2011 9:55 PM, "Gregory D'alesandre" <[email protected]> wrote: > Hello all, > > Trying to show an accurate and honest representation of the status of a > massive distributed service is a really hard technical challenge but an > even harder conceptual one. While your app might be showing higher latency > or errors that doesn't indicate a systematic issue with the whole service. > For instance, the main reason we are encouraging customers to move to HRD > is because M/S is dependent on single BigTable tablets, this means you can > have lots of issues when there is absolutely nothing wrong systematically > with GAE. There was a small service disruption (on the order of minutes) > in some HRD apps recently when a datacenter was having systematic issues so > we have to re-ruote that traffic. But, it didn't show up on the status > site because it was a short disruption that only affected a small portion > of users. > > The upshot of this is that our status site gives a general sense of how > App Engine is running but that doesn't show whether your app is > experiencing issues or not, it just shows whether the probes we are using > to generate the information are having issues. So, at times, when it says > there is a problem initially and then it disappears it is usually because a > prober app was having an issue but it was not a large-scale issue for all > of GAE. We are not trying to hide issues, quite the contrary when there is > a large-scale systematic issue we have a policy of doing post-mortems and > posting them publicly, So, if your app is having an issue and the status > site looks fine, this is probably not a lie but rather an artifact of how > we show status for the system. > > We are looking into ways to improve this to be more useful but I hope that > helps clarify why you see what you see. > > Yoav, I never saw a response as to whether you are using HRD or not, the > 99.95% SLA only applies to HRD because we know M/S is going to have issues. > As always, thanks for the feedback! > > Greg D'Alesandre > Senior Product Manager, Google App Engine > > On Fri, Nov 18, 2011 at 7:07 PM, WallyDD <[email protected]> wrote: > >> I have always wondered the same thing. One minute there is an issue, a >> few days later it never happened. >> Google is far from alone with such issues which is why there are >> websites/services that monitor cloud status. >> >> It may be a little unfair calling the app engine team dishonest. >> Trying to change something in a large organization can be a very >> unrewarding experience. >> >> On Nov 17, 9:24 am, trilok <[email protected]> wrote: >> > Google app engine hello, >> > >> > Let me first specify that I am a paying app engine user for about 1.5 >> > years. We, at my company, have developed an online restaurant takeout/ >> > delivery ordering service running completely on the appengine. We >> > currently serve over 50 restaurants in my home country, and are now >> > expanding abroad with restaurants in Canada, Hungary, Belgium, UK, and >> > more. >> > >> > Ever since the appengine's release from production a week ago, there >> > has been 3 (!!!) major disruptions - On 7th for 45 minutes, yesterday >> > for 30 minutes, and right now. I understand that failures occur, but >> > specifying a "99.95%" and being so far from it is to me a major >> > failure on the part of Google. >> > >> > To make matters worse, we, AppEngine's paying users, NEVER receive any >> > explanations or descriptions of the cause of the failure, the solution >> > and Google's efforts to prevent its returning occurance. Not by any >> > means to compare, but EC2's team constantly admit and report ALL of >> > the failures and their debriefing! >> > >> > And now for the "cherry on the top", and the reason I used the word >> > 'dishonesty' - You remove any note of the disruption from System >> > Status. For example, yesterday there was a disruption causing 40 secs >> > (!!!) of latency in response. Today viewing the System Status, >> > yesterday is marker with "No significant issues". That to me is >> > dishonesty and a clear cut lie. >> > >> > Unfortunately, our service is now so deeply connect to the AppEngine >> > framework that leaving this service is currently not an option, but I >> > would definitively not advise or recommend anyone to use the AppEngine >> > today, and my next product will definitely not run on the AppEngine. >> > >> > Regards, >> > >> > - Yoav. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Google App Engine" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/google-appengine?hl=en. >> >> > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
