Hey guys,

I understand your frustration. We have many Google services deployed on top
of App Engine as well, and we get pressure from both sides anytime events
impact production. There are several issues around communication being
highlighted here:

- App Engine Status page was being updated when we were having latency
problems
- Status page did not accurately describe the impact
- Delay between when we recognized a production event and posting to
downtime notify
- Downtime-notify emails are being marked as spam

We've attempted to take steps in the past to resolve the spam issue, though
users are reporting that it hasn't worked. As far as the delay between us
accurately identifying a production event and updating the groups - well,
we'll have to figure out how we can minimize that. At the very least, there
will be a communications post-mortem internally about what we plan on doing
to address or at least minimizing the impact of these issues.

The option to go into an unplanned maintenance period was on the table, but
it was one of those situations where we assessed it as overkill, especially
since there were period when the latency appeared to have died down, only to
restart again. We don't want to be too trigger happy with unscheduled
downtime, as degraded performance is usually preferable to a completely
downtime state (many of you may disagree with me, but this is a bit of a
judgment call depending on the level of degradation). We're cautiously
optimistic at the moment about performance, but at this point, if the spikes
begin appearing again, we may initiate another full downtime period. Stay
tuned to the downtime-notify list. I'll let my team members know to post
using their @google.com accounts to avoid being marked as spam.

On Wed, Sep 15, 2010 at 4:10 PM, Cameron <[email protected]> wrote:

> Hi Ikai -
>
> Just to be clear the issues have NOT subsided since last night.  I
> hope you guys are working on eliminating the root of the problem and
> not just "monitoring the service closely" as the latest App Engine
> Notify post suggests.
>
>
> http://groups.google.com/group/google-appengine-downtime-notify/browse_thread/thread/9cf3b0cafdd6c235
>
> And even though the status page says "this spike did not affect the
> performance or uptime of applications" - every spike DOES in fact
> affect the performance of applications (mine at least - GQueues, but
> probably all apps).  These red spikes make my app inaccessible.  Even
> the yellow spikes cause many 500 errors.  Basically this makes my app
> unusable, because users can't get any consistent work done with the
> frequent errors.
>
>
> http://code.google.com/status/appengine/detail/datastore/2010/09/15#ae-trust-detail-datastore-get-latency
>
> Most of all, the frequent errors make my app seem very brittle and
> deteriorates user confidence.  Sales drop and my own forum gets lots
> of complaints.  And then of course people start posting on Twitter.
>
> Anyway, I'm sure you guys are working very hard to fix the issues and
> want App Engine to be as reliable as possible.  My suggestion is that
> you also look to improve communication during these times.  I have to
> respond to my own users during these situations.  This becomes very
> difficult when all I can tell them is "Google thinks the issue is
> resolved and is just monitoring the situation" when clearly the status
> graphs indicate otherwise and people can't access my app.  Or "Google
> says the issue didn't affect performance or uptime" when clearly it
> has.
>
> -Cameron
>
>
> On Sep 15, 6:13 am, Arny <[email protected]> wrote:
> > We're still getting a lot of 500s (dashboard & front end).
> >
> > Did you transferred our apps to lower-cost servers or why is
> > everything working that bad since the maintenance?
> > When are the REAL paid services coming?
> >
> > On Sep 15, 6:17 am, "Ikai Lan (Google)" 
> > <[email protected]<ikai.l%[email protected]>
> >
> > wrote:
> >
> >
> >
> > > Hi Tim,
> >
> > > You can track the progress here:
> >
> > >http://groups.google.com/group/google-appengine-downtime-notify/brows.
> ..
> >
> > > It's pretty hard to give an ETA, but we'd like to resolve this as soon
> as
> > > possible. We're seeing signs that the issues may have subsided, but
> we'd
> > > like a bit more confidence before giving the all clear.
> >
> > > On Tue, Sep 14, 2010 at 6:49 PM, Tim Hoffman <[email protected]>
> wrote:
> > > > Hi
> >
> > > >
> http://groups.google.com.au/group/google-appengine-downtime-notify/br...
> > > > was posted several hours ago, with no updates.
> >
> > > > I certainly am experiencing significant ongoing issues with
> taskqueues
> > > > and datastore timeouts and half the time can't get to the dashboard.
> >
> > > > I know someone must be working hard on this, but a little more detail
> > > > on progress, ie ETA to recovery would be really great
> >
> > > > Thanks
> >
> > > > Tim
> >
> > > > --
> > > > You received this message because you are subscribed to the Google
> Groups
> > > > "Google App Engine" group.
> > > > To post to this group, send email to
> [email protected].
> > > > To unsubscribe from this group, send email to
> > > > [email protected]<google-appengine%[email protected]><google-appengine%2Bunsubscrib
> [email protected]>
> > > > .
> > > > For more options, visit this group at
> > > >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to