overlays.gentoo.org service has been restored on a new system. Some statistics and a post-mortem follow.
Special thanks to antarus and a3li for all their interactions with our sponsor, and managing most of the details. I just did the final data recovery and this writeup. Please resume using the service, and if you see something weird that you think is different from before, please file a bug for Infrastructure. In the process, the service moved to a new machine. The SSH keys have changed as follows: DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5 RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9 ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8 As this time, we will NOT be restoring Trac due to low demand. If you still require an web-based SVN browser for old SVN repos, please contact us at [email protected]. If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push to the server again. IMPORTANT: The following repos were damaged beyond repair, and were not available in backups. You'll need to push again, I have reset the repos to empty: dev/anarchy.git dev/dberkholz.git dev/dev-zero.git dev/dilfridge.git dev/fordfrog.git dev/graaff.git dev/maekke.git dev/mschiff.git dev/quantumsummers.git dev/zorry.git FYI: The following repos appeared to be empty: dev/b33fc0d3.git dev/moult.git dev/tomwij.git user/blueicefield.git user/disinbox.git user/palatis.git user/paragon.git user/vmalov.git user/xray.git FYI: The following repos contained dangling commits/tags/blobs, and this should not be considered new breakage; if you have a newer copy, you are encouraged to push again: dev/blueness.git dev/maksbotan.git dev/mgorny.git dev/qiaomuf.git dev/xmw.git proj/betagarden.git proj/catalyst.git (+tags) proj/devmanual.git proj/dotnet.git proj/elfix.git (+tags) proj/emacs-tools.git proj/gamerlay.git proj/hardened-dev.git proj/hardened-patchset.git proj/kde.git proj/lisp.git proj/openrc.git (+tags) proj/portage.git proj/ruby-overlay.git proj/sci.git proj/sunrise.git proj/webapp-config.git proj/x11.git user/gmt.git user/mv.git (+blobs) user/palmer.git Statistics: ----------- 354 repos total - 10 repos unrecoverable (all in /dev) = 344 repos recovered/available 9 repos that seem to empty 26 repos with dangling commits/tags/blobs 2 repos recovered from external sources. Breakdown by path: ------------------ 193 proj/ repos 69 dev/ repos 91 user/ repos 1 other repo Post-mortem ----------- Hornbill went offline around: 2014-01-10 13:13 UTC Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC Between the backup starting, and the server going offline, we were able to confirm writes to the following Git repos: dev/fordfrog.git proj/kde.git gitolite-admin.git We believe that there were no writes to user/ repos, but are not 100% certain, as the logging was insufficient for this purpose. Hornbill went offline just over a week ago: Mid-afternoon on a Friday for the timezone where it's located. Due staff turnover and business changes at the previous sponsor, we were not able to contact anybody until regular office hours on Monday, January 13th. The server in question, while previously functioning, was not recoverable after a remote hands reboot on Monday afternoon (UTC). On Tuesday, more the sponsor was able to examine in it more depth, and it was not recoverable. More concealingly, it turned out to be one of the few remaining Gentoo infrastructure systems with IDE drives. The data was recovered, however it seemed to have a lot of corruption. It was noted that our backups were missing all of the dev/ repos, due to a system-wide rule to exclude /dev/ from backups (the rule should only be the real /dev, not any directory simply named "dev"). For this reason, we decided to try and get the data from the old server. Verification/recovery of the remaining data was also hampered by confirming that some of the Git repos in the backup were not entirely clean, containing legacy errors that turned out to be false positives from their CVS/SVN conversions, or dangling commits/blobs/tags. What could we do better next time: ---------------------------------- - Have backups of all repos! - Compare the age of the backup immediately, and consider going live with the backup. Only 5 hours of work would have been lost, and even then possibly only temporarily, due to the distributed nature of Git. - More people need to use the infra-status page to learn about the state of Gentoo services. Actions for Infra ----------------- - Include dev/ repos were not in the backup - Set up Gitolite mirroring - Review gitolite logging (needs to be easier to confirm when writes took place) -- Robin Hugh Johnson Gentoo Linux: Developer, Infrastructure Lead E-Mail : [email protected] GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
signature.asc
Description: Digital signature
