We may just have different locations for where our apps are... But we have
uptime monitoring on several dozen domains and they get checked every 5
minutes, so we are checked something like every 15 seconds for the uptime of
our primary app.  We've had 90 seconds or so of downtime, and during that
time static files still served and requests that took less than 20 seconds
to fill were served.

We have seen issues where long time to fill requests had high fail rates.
We have also seen that if we set the application settings to have too few
idle instances that we got a LOT of 500 errors.

Do you have your app set to automatic? Or have you clamped your Apps number
of idle instances.  How long does a typical request take to fill? How about
a "long" request.

Some of your downtime may be your own fault, not GAE's. Don't know that for
sure, but when my multiple apps don't exhibit a behavior I assume that the
issue isn't system wide, but localized to something a given user is doing.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of trilok
Sent: Saturday, November 19, 2011 3:42 PM
To: Google App Engine
Subject: [google-appengine] Re: Google App Engine's Team Dishonesty

Hello,

Me again :)

Yes, we are using HRD. Indeed since we moved to it (at least 4-5 months
ago), things became stabler... stable enough? Good question.

I have a monitoring SW (running on EC2) making a request every minute.
In the past week this monitoring system gave me at least 2 errors per day
and sometimes more (500 - Internal Server Error... and no, the request never
reached our app. It fails before us). I know it seems like a very low
number, but still I'd like to have one day without an error. (2 disappointed
clients a day for me, as a very young start-up, can cause some very bad
brand reputation).

Regarding the dishonesty issue - It still amazes me that at times I see an
"Investigating" or "Elevated" sign in the system status, sometimes 45
minutes of a very high Java latency, and the next day "No significant
issues" on the previous day. I, and I'm guessing the rest of the people
here, would really appreciate some kind of acknowledgment from Google that
you have seen the issue, and didn't just "let it disappear" but rather
investigated it, found the cause, and are performing steps to make sure it
does reappear.

Basically what I'm asking, and what I think everyone is asking here, is to
know that there is someone to talk to. If you look back at the 'issues'
site, you'll see many 'production' issues from people like me crying for
help during downtime. These issues have gone unanswered even now, months
after the issues. If you'll look at other monitoring sites you'll see that
there is some kind of description of the issues as they happen. Now, I fully
understand that during times of disruptions you guys are amazingly busy in
trying to solve them, but perhaps just a word from a human-being and not an
automated SW to show that we have someone there helping us, and perhaps,
just perhaps an ETA on a solution?

Thank you again, and sorry for the long posts - It is just frustrating
having nothing to do during down times other then refreshing the status
monitor and prying (last downtime, 45 minutes, I pryed to every religions'
god - anything that can work, I don't discriminate during down times :) )

 - Yoav

On Nov 19, 11:55 pm, "Gregory D'alesandre" <[email protected]> wrote:
> Hello all,
>
> Trying to show an accurate and honest representation of the status of 
> a massive distributed service is a really hard technical challenge but 
> an even harder conceptual one.  While your app might be showing higher 
> latency or errors that doesn't indicate a systematic issue with the whole
service.
>  For instance, the main reason we are encouraging customers to move to 
> HRD is because M/S is dependent on single BigTable tablets, this means 
> you can have lots of issues when there is absolutely nothing wrong 
> systematically with GAE.  There was a small service disruption (on the 
> order of minutes) in some HRD apps recently when a datacenter was 
> having systematic issues so we have to re-ruote that traffic.  But, it 
> didn't show up on the status site because it was a short disruption 
> that only affected a small portion of users.
>
> The upshot of this is that our status site gives a general sense of 
> how App Engine is running but that doesn't show whether your app is 
> experiencing issues or not, it just shows whether the probes we are 
> using to generate the information are having issues.  So, at times, 
> when it says there is a problem initially and then it disappears it is 
> usually because a prober app was having an issue but it was not a 
> large-scale issue for all of GAE.  We are not trying to hide issues, 
> quite the contrary when there is a large-scale systematic issue we 
> have a policy of doing post-mortems and posting them publicly,  So, if 
> your app is having an issue and the status site looks fine, this is 
> probably not a lie but rather an artifact of how we show status for the
system.
>
> We are looking into ways to improve this to be more useful but I hope 
> that helps clarify why you see what you see.
>
> Yoav, I never saw a response as to whether you are using HRD or not, 
> the 99.95% SLA only applies to HRD because we know M/S is going to have
issues.
>  As always, thanks for the feedback!
>
> Greg D'Alesandre
> Senior Product Manager, Google App Engine
>
>
>
>
>
>
>
> On Fri, Nov 18, 2011 at 7:07 PM, WallyDD <[email protected]> wrote:
> > I have always wondered the same thing. One minute there is an issue, 
> > a few days later it never happened.
> > Google is far from alone with such issues which is why there are 
> > websites/services that monitor cloud status.
>
> > It may be a little unfair calling the app engine team dishonest.
> > Trying to change something in a large organization can be a very 
> > unrewarding experience.
>
> > On Nov 17, 9:24 am, trilok <[email protected]> wrote:
> > > Google app engine hello,
>
> > > Let me first specify that I am a paying app engine user for about 
> > > 1.5 years. We, at my company, have developed an online restaurant 
> > > takeout/ delivery ordering service running completely on the 
> > > appengine. We currently serve over 50 restaurants in my home 
> > > country, and are now expanding abroad with restaurants in Canada, 
> > > Hungary, Belgium, UK, and more.
>
> > > Ever since the appengine's release from production a week ago, 
> > > there has been 3 (!!!) major disruptions - On 7th for 45 minutes, 
> > > yesterday for 30 minutes, and right now. I understand that 
> > > failures occur, but specifying a "99.95%" and being so far from it 
> > > is to me a major failure on the part of Google.
>
> > > To make matters worse, we, AppEngine's paying users, NEVER receive 
> > > any explanations or descriptions of the cause of the failure, the 
> > > solution and Google's efforts to prevent its returning occurance. 
> > > Not by any means to compare, but EC2's team constantly admit and 
> > > report ALL of the failures and their debriefing!
>
> > > And now for the "cherry on the top", and the reason I used the 
> > > word 'dishonesty' - You remove any note of the disruption from 
> > > System Status. For example, yesterday there was a disruption 
> > > causing 40 secs
> > > (!!!) of latency in response. Today viewing the System Status, 
> > > yesterday is marker with "No significant issues". That to me is 
> > > dishonesty and a clear cut lie.
>
> > > Unfortunately, our service is now so deeply connect to the 
> > > AppEngine framework that leaving this service is currently not an 
> > > option, but I would definitively not advise or recommend anyone to 
> > > use the AppEngine today, and my next product will definitely not run
on the AppEngine.
>
> > > Regards,
>
> > >  - Yoav.
>
> > --
> > You received this message because you are subscribed to the Google 
> > Groups "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected].
> > For more options, visit this group at 
> >http://groups.google.com/group/google-appengine?hl=en.

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to