[google-appengine] Re: Google App Engine's Team Dishonesty

trilok Sat, 19 Nov 2011 16:20:28 -0800

OK, ok, ok, so we don't talk in theory (and to answer one question -
the errors are of requests that don't reach our app. They fail
before):


Example 1
----------------
http://code.google.com/status/appengine/detail/serving-java/2011/11/15#ae-trust-detail-helloworld-secure-get-java-error_rate

- First look at the general console, you will see no mentioning of
this event.
- OK. Failures happen. Now, why? what is done? Who is working on it?
During the failure, when will it be fixed? Any kind of online
information so I won't feel alone in the dark! Trust me, I'd feel
better knowing that someone brought his pet and it cut the power cord,
and it will take 30 minutes to come back rather then nothing at all!

Example 2
----------------
http://code.google.com/p/googleappengine/issues/detail?id=6274&can=5&colspec=ID%20Type%20Component%20Status%20Stars%20Summary%20Language%20Priority%20Owner%20Log

This was during a 45 minute down-time (BTW, marked again as 'under
investigation' after being marker as 'no significant issues' so kudos
on the remark!) between the 5th and the 6th phone call I've received
from restaurants wanting to leave my system. I think this was after I
tried Tweeting app engine, and right before I was looking for Mr.
Page's personal email address :)

- Is there anyone to talk to during these times or do I HAVE to sign
up for the $500 program just to have someone tell me when will the
problem be solved?

I don't think GAE teams is anything but very very professional. I
don't think I would ever manage to really set up such a wonderful
service. I just think that their customer-relationship needs a bit of
work (which will definitely result in more people joining and
staying!).

 - Yoav.


On Nov 20, 2:01 am, "Brandon Wirtz" <[email protected]> wrote:
> We may just have different locations for where our apps are... But we have
> uptime monitoring on several dozen domains and they get checked every 5
> minutes, so we are checked something like every 15 seconds for the uptime of
> our primary app.  We've had 90 seconds or so of downtime, and during that
> time static files still served and requests that took less than 20 seconds
> to fill were served.
>
> We have seen issues where long time to fill requests had high fail rates.
> We have also seen that if we set the application settings to have too few
> idle instances that we got a LOT of 500 errors.
>
> Do you have your app set to automatic? Or have you clamped your Apps number
> of idle instances.  How long does a typical request take to fill? How about
> a "long" request.
>
> Some of your downtime may be your own fault, not GAE's. Don't know that for
> sure, but when my multiple apps don't exhibit a behavior I assume that the
> issue isn't system wide, but localized to something a given user is doing.
>
>
>
>
>
>
>
> -----Original Message-----
> From: [email protected]
>
> [mailto:[email protected]] On Behalf Of trilok
> Sent: Saturday, November 19, 2011 3:42 PM
> To: Google App Engine
> Subject: [google-appengine] Re: Google App Engine's Team Dishonesty
>
> Hello,
>
> Me again :)
>
> Yes, we are using HRD. Indeed since we moved to it (at least 4-5 months
> ago), things became stabler... stable enough? Good question.
>
> I have a monitoring SW (running on EC2) making a request every minute.
> In the past week this monitoring system gave me at least 2 errors per day
> and sometimes more (500 - Internal Server Error... and no, the request never
> reached our app. It fails before us). I know it seems like a very low
> number, but still I'd like to have one day without an error. (2 disappointed
> clients a day for me, as a very young start-up, can cause some very bad
> brand reputation).
>
> Regarding the dishonesty issue - It still amazes me that at times I see an
> "Investigating" or "Elevated" sign in the system status, sometimes 45
> minutes of a very high Java latency, and the next day "No significant
> issues" on the previous day. I, and I'm guessing the rest of the people
> here, would really appreciate some kind of acknowledgment from Google that
> you have seen the issue, and didn't just "let it disappear" but rather
> investigated it, found the cause, and are performing steps to make sure it
> does reappear.
>
> Basically what I'm asking, and what I think everyone is asking here, is to
> know that there is someone to talk to. If you look back at the 'issues'
> site, you'll see many 'production' issues from people like me crying for
> help during downtime. These issues have gone unanswered even now, months
> after the issues. If you'll look at other monitoring sites you'll see that
> there is some kind of description of the issues as they happen. Now, I fully
> understand that during times of disruptions you guys are amazingly busy in
> trying to solve them, but perhaps just a word from a human-being and not an
> automated SW to show that we have someone there helping us, and perhaps,
> just perhaps an ETA on a solution?
>
> Thank you again, and sorry for the long posts - It is just frustrating
> having nothing to do during down times other then refreshing the status
> monitor and prying (last downtime, 45 minutes, I pryed to every religions'
> god - anything that can work, I don't discriminate during down times :) )
>
>  - Yoav
>
> On Nov 19, 11:55 pm, "Gregory D'alesandre" <[email protected]> wrote:
> > Hello all,
>
> > Trying to show an accurate and honest representation of the status of
> > a massive distributed service is a really hard technical challenge but
> > an even harder conceptual one.  While your app might be showing higher
> > latency or errors that doesn't indicate a systematic issue with the whole
> service.
> >  For instance, the main reason we are encouraging customers to move to
> > HRD is because M/S is dependent on single BigTable tablets, this means
> > you can have lots of issues when there is absolutely nothing wrong
> > systematically with GAE.  There was a small service disruption (on the
> > order of minutes) in some HRD apps recently when a datacenter was
> > having systematic issues so we have to re-ruote that traffic.  But, it
> > didn't show up on the status site because it was a short disruption
> > that only affected a small portion of users.
>
> > The upshot of this is that our status site gives a general sense of
> > how App Engine is running but that doesn't show whether your app is
> > experiencing issues or not, it just shows whether the probes we are
> > using to generate the information are having issues.  So, at times,
> > when it says there is a problem initially and then it disappears it is
> > usually because a prober app was having an issue but it was not a
> > large-scale issue for all of GAE.  We are not trying to hide issues,
> > quite the contrary when there is a large-scale systematic issue we
> > have a policy of doing post-mortems and posting them publicly,  So, if
> > your app is having an issue and the status site looks fine, this is
> > probably not a lie but rather an artifact of how we show status for the
> system.
>
> > We are looking into ways to improve this to be more useful but I hope
> > that helps clarify why you see what you see.
>
> > Yoav, I never saw a response as to whether you are using HRD or not,
> > the 99.95% SLA only applies to HRD because we know M/S is going to have
> issues.
> >  As always, thanks for the feedback!
>
> > Greg D'Alesandre
> > Senior Product Manager, Google App Engine
>
> > On Fri, Nov 18, 2011 at 7:07 PM, WallyDD <[email protected]> wrote:
> > > I have always wondered the same thing. One minute there is an issue,
> > > a few days later it never happened.
> > > Google is far from alone with such issues which is why there are
> > > websites/services that monitor cloud status.
>
> > > It may be a little unfair calling the app engine team dishonest.
> > > Trying to change something in a large organization can be a very
> > > unrewarding experience.
>
> > > On Nov 17, 9:24 am, trilok <[email protected]> wrote:
> > > > Google app engine hello,
>
> > > > Let me first specify that I am a paying app engine user for about
> > > > 1.5 years. We, at my company, have developed an online restaurant
> > > > takeout/ delivery ordering service running completely on the
> > > > appengine. We currently serve over 50 restaurants in my home
> > > > country, and are now expanding abroad with restaurants in Canada,
> > > > Hungary, Belgium, UK, and more.
>
> > > > Ever since the appengine's release from production a week ago,
> > > > there has been 3 (!!!) major disruptions - On 7th for 45 minutes,
> > > > yesterday for 30 minutes, and right now. I understand that
> > > > failures occur, but specifying a "99.95%" and being so far from it
> > > > is to me a major failure on the part of Google.
>
> > > > To make matters worse, we, AppEngine's paying users, NEVER receive
> > > > any explanations or descriptions of the cause of the failure, the
> > > > solution and Google's efforts to prevent its returning occurance.
> > > > Not by any means to compare, but EC2's team constantly admit and
> > > > report ALL of the failures and their debriefing!
>
> > > > And now for the "cherry on the top", and the reason I used the
> > > > word 'dishonesty' - You remove any note of the disruption from
> > > > System Status. For example, yesterday there was a disruption
> > > > causing 40 secs
> > > > (!!!) of latency in response. Today viewing the System Status,
> > > > yesterday is marker with "No significant issues". That to me is
> > > > dishonesty and a clear cut lie.
>
> > > > Unfortunately, our service is now so deeply connect to the
> > > > AppEngine framework that leaving this service is currently not an
> > > > option, but I would definitively not advise or recommend anyone to
> > > > use the AppEngine today, and my next product will definitely not run
> on the AppEngine.
>
> > > > Regards,
>
> > > >  - Yoav.
>
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Google App Engine" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> > > [email protected].
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group 
> athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: Google App Engine's Team Dishonesty

Reply via email to