[google-appengine] Re: Error handling during downtime

'Tiago (Google Cloud Platform Support)' via Google App Engine Wed, 15 May 2019 20:11:20 -0700

Hello,

The Cloud Datastore SLA agreement <https://cloud.google.com/datastore/sla> 
doesn't specify answers to many of the questions posed here on purpose: 
it's extremely hard to predict if downtime will happen all at once or 
intermittently, as those events are most often unplanned by their own 
nature. Indeed, a quick glance at previous incidents 
<https://status.cloud.google.com/summary#cloud-datastore> reveal the 
occurrence of them both in the past year. When designing your application, 
it's probably better to abstract such unknowns and implement general 
fail-safe mechanisms - for instance, if a write fails, you can catch the 
Datastore exception and enqueue a task to retry later, etc.


That being said, given the small budget for downtime allocated for Cloud 
Datastore (and taking into consideration its past generally reliable 
behavior), it's more common to observe issues with it due to the 
implementation not following the general best practices 
<https://cloud.google.com/datastore/docs/best-practices> or because of 
sub-optimal 
design <https://cloud.google.com/appengine/articles/scalability>. There's a 
greater benefit to be reaped in terms of your app's overall reliability by 
focusing on a general strategy to give those topics the proper attention 
they deserve in development instead.

On Friday, April 26, 2019 at 12:21:50 PM UTC-4, dir Ls wrote:
>
> Cloud datastore has 99.95% monthly uptime SLA for multi-region which 
> translates to slightly above 20 minutes per month. Is this downtime likely 
> to happen all at once or intermittently? What kind of errors are to be 
> expected during the downtime? I am trying to figure out the strategy 
> required to be put in place on how the app should respond to end users 
> during the downtime. Would it be possible that it works for data related to 
> some users but not the others at a given time? I am looking for a best 
> practice guidance for an app that is expected to be usable 24/7 with 
> graceful downgrading based on the underlying services. For example, if the 
> downtime is intermittent, users might just reload the page and won't even 
> know something wrong happened. But if the downtime is prolonged, explicitly 
> displaying that the system is currently inaccessible and asking them to 
> visit after sometime might be better.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/1c8d0874-c870-4e16-a111-51ae51f8adf1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] Re: Error handling during downtime

Reply via email to