Hello everyone,
As you might have noticed, we had a major issue in the GNOME infrastructure
last night, which extended as far as to render almost every service we provide
unavailable.
This was caused by our main file server stopping to serve the file systems
required for home directories and mailing lists.
The cause about the outage is current not clear as the logs are not showing up
anything relevant.
We've sent them to gluster engineers to ask them for help on analyzing them.
On rebooting the server, something went wrong, requiring a powercycle of the
affected machine.
When trying this, we were hit by a bug in the management cards that made us
unable to use them to reboot the server.
Because of this, we have requested hands-on service to get the server power
cycled, which had us waiting for some time.
Within minutes after the server was rebooted, the file systems came back
online, and with it all of the GNOME services.
To prevent all services from going down when the primary file server would go
down, we had previously setup a synchronized secondary file server.
The reason we were unable to make all servers fallback to this one was because
we weren't able to login to the affected servers to update the target IP.
To prevent this problem from pulling down the entire GNOME infrastructure in
the future, we have taken some steps:
- We have added a way for us to login to any server even if the home
directories are down.
- We'll be introducing automatic failover to the other available file server
- We'll be spreading our documentation off-site to prevent the relevant
documentation to disappear when the machine hosting
is experiencing problems
- We will be making sure to get access to the power management to our
servers, so we can reboot them even if the management
cards are not functioning
We really hope that this will prevent such drastic failures in the future, and
make it easier to recover if problems do occur.
If you have any additional questions, don't hesitate to contact either of us on
IRC (#sysadmin) or by sending us an email.
With kind regards,
Patrick Uiterwijk and Andrea Veri
System Administrators, GNOME
_______________________________________________
desktop-devel-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/desktop-devel-list