On Mon, Oct 7, 2019 at 1:44 PM Maarten Dammers <[email protected]> wrote:
>
> Andrew,
>
> I got 30+ emails and counting for stale file handles, permission errors,
> email bounces, etc. Not sure what definition of brief you're using for
> the outage (in my line of work that's sometimes less than a second), but
> all these emails seem to have been send in a two hour window based on
> the timestamps. Start 17:42 end 19:55 (both Amsterdam time). Maybe you
> can stop cron next time you're going to planned break toollabs? Or was
> this caused by the dns + network outage?

The errors in Toolforge were unintended side effects of the OpenStack
upgrade. The network issues cascaded to cause a variety of issues
across Cloud VPS projects including Toolforge. We will be working on
an incident report and it will be shared with this list when it has
been prepared.

Those of us on the "root@" emails got around 400+ emails triggered by
various parts of the service interruption, so we have empathy for the
inbox problem this caused for others. :/

Bryan
-- 
Bryan Davis              Technical Engagement      Wikimedia Foundation
Principal Software Engineer                               Boise, ID USA
[[m:User:BDavis_(WMF)]]                                      irc: bd808

_______________________________________________
Wikimedia Cloud Services mailing list
[email protected] (formerly [email protected])
https://lists.wikimedia.org/mailman/listinfo/cloud

Reply via email to