Hi guys Today our app went offline for 30 minutes because we hit our daily budget of $1,000. The cost was primarily in database reads. Normally we'd be expecting $100 - $200.
There were some database timeouts just before we went offline. One initial theory is that the database failing caused many tasks to continually retry. Unfortunately I think, at the time, were were doing a full backup and also syncing our customer records with MailChimp. Did anyone else notice another similar? Anyone got any ideas about how to limit costs blowouts like this when there is some problem with the underlying infrastructure and you have a lot of failing tasks basically DoS'ing your own site? The only "solution" I can think of is to have a really really high daily budget, and try to periodically detect abnormal usage... That's quite risky though if you don't detect it in time. It's hard to know who should fix this. Is it AppEngine's job because the infrastructure (potentially) failed? Or is it our fault for not trying to detect that? That's really hard when you're running a really big MapReduce, for example, where you expect a few errors, but you don't want errors to continue to fester and then take your site offline. Maybe there could be a setting to automatically pause a task queue and email the administrators if X% of tasks fail within a certain timeframe? Cheers Mike -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/google-appengine?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
