Hello everyone, As you might have noticed, we had a major issue in the GNOME infrastructure last night, which extended as far as to render almost every service we provide unavailable. This was caused by our main file server stopping to serve the file systems required for home directories and mailing lists.
The cause about the outage is current not clear as the logs are not showing up anything relevant. We've sent them to gluster engineers to ask them for help on analyzing them. On rebooting the server, something went wrong, requiring a powercycle of the affected machine. When trying this, we were hit by a bug in the management cards that made us unable to use them to reboot the server. Because of this, we have requested hands-on service to get the server power cycled, which had us waiting for some time. Within minutes after the server was rebooted, the file systems came back online, and with it all of the GNOME services. To prevent all services from going down when the primary file server would go down, we had previously setup a synchronized secondary file server. The reason we were unable to make all servers fallback to this one was because we weren't able to login to the affected servers to update the target IP. To prevent this problem from pulling down the entire GNOME infrastructure in the future, we have taken some steps: - We have added a way for us to login to any server even if the home directories are down. - We'll be introducing automatic failover to the other available file server - We'll be spreading our documentation off-site to prevent the relevant documentation to disappear when the machine hosting is experiencing problems - We will be making sure to get access to the power management to our servers, so we can reboot them even if the management cards are not functioning We really hope that this will prevent such drastic failures in the future, and make it easier to recover if problems do occur. If you have any additional questions, don't hesitate to contact either of us on IRC (#sysadmin) or by sending us an email. With kind regards, Patrick Uiterwijk and Andrea Veri System Administrators, GNOME _______________________________________________ foundation-list mailing list foundation-list@gnome.org https://mail.gnome.org/mailman/listinfo/foundation-list