On Fri, Jan 17, 2014 at 9:02 PM, Robin H. Johnson <[email protected]>wrote:
> overlays.gentoo.org service has been restored on a new system. > Some statistics and a post-mortem follow. > > Special thanks to antarus and a3li for all their interactions with our > sponsor, > and managing most of the details. I just did the final data recovery and > this > writeup. > > Please resume using the service, and if you see something weird that you > think is different from before, please file a bug for Infrastructure. > > In the process, the service moved to a new machine. The SSH keys have > changed > as follows: > DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5 > RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9 > ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8 > > As this time, we will NOT be restoring Trac due to low demand. If you > still require an web-based SVN browser for old SVN repos, please contact > us at [email protected]. > For Trac wiki users. The recommendation is to move to wiki.gentoo.org. If you hadn't migrated, and you need a copy of your Trac wiki pages from overlays.gentoo.org, please file a bug against infra and someone (me) will restore them for on a request by request basis. I think the deal is that I can pretty trivially give you a tarball of markup files (one per wiki page.) -A > > If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push > to the server again. > > IMPORTANT: The following repos were damaged beyond repair, and were not > available in backups. You'll need to push again, I have reset the repos to > empty: > dev/anarchy.git > dev/dberkholz.git > dev/dev-zero.git > dev/dilfridge.git > dev/fordfrog.git > dev/graaff.git > dev/maekke.git > dev/mschiff.git > dev/quantumsummers.git > dev/zorry.git > > FYI: The following repos appeared to be empty: > dev/b33fc0d3.git > dev/moult.git > dev/tomwij.git > user/blueicefield.git > user/disinbox.git > user/palatis.git > user/paragon.git > user/vmalov.git > user/xray.git > > FYI: The following repos contained dangling commits/tags/blobs, and this > should not be considered new breakage; if you have a newer copy, you are > encouraged to push again: > dev/blueness.git > dev/maksbotan.git > dev/mgorny.git > dev/qiaomuf.git > dev/xmw.git > proj/betagarden.git > proj/catalyst.git (+tags) > proj/devmanual.git > proj/dotnet.git > proj/elfix.git (+tags) > proj/emacs-tools.git > proj/gamerlay.git > proj/hardened-dev.git > proj/hardened-patchset.git > proj/kde.git > proj/lisp.git > proj/openrc.git (+tags) > proj/portage.git > proj/ruby-overlay.git > proj/sci.git > proj/sunrise.git > proj/webapp-config.git > proj/x11.git > user/gmt.git > user/mv.git (+blobs) > user/palmer.git > > Statistics: > ----------- > 354 repos total > - 10 repos unrecoverable (all in /dev) > = 344 repos recovered/available > > 9 repos that seem to empty > 26 repos with dangling commits/tags/blobs > 2 repos recovered from external sources. > > Breakdown by path: > ------------------ > 193 proj/ repos > 69 dev/ repos > 91 user/ repos > 1 other repo > > Post-mortem > ----------- > Hornbill went offline around: 2014-01-10 13:13 UTC > Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC > Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC > > Between the backup starting, and the server going offline, we were able > to confirm writes to the following Git repos: > dev/fordfrog.git > proj/kde.git > gitolite-admin.git > > We believe that there were no writes to user/ repos, but are not 100% > certain, as the logging was insufficient for this purpose. > > Hornbill went offline just over a week ago: Mid-afternoon on a Friday > for the timezone where it's located. Due staff turnover and business > changes at the previous sponsor, we were not able to contact anybody > until regular office hours on Monday, January 13th. > > The server in question, while previously functioning, was not > recoverable after a remote hands reboot on Monday afternoon (UTC). > On Tuesday, more the sponsor was able to examine in it more depth, and > it was not recoverable. More concealingly, it turned out to be one of > the few remaining Gentoo infrastructure systems with IDE drives. The > data was recovered, however it seemed to have a lot of corruption. > > It was noted that our backups were missing all of the dev/ repos, due to > a system-wide rule to exclude /dev/ from backups (the rule should only > be the real /dev, not any directory simply named "dev"). For this > reason, we decided to try and get the data from the old server. > > Verification/recovery of the remaining data was also hampered by > confirming that some of the Git repos in the backup were not entirely > clean, containing legacy errors that turned out to be false positives > from their CVS/SVN conversions, or dangling commits/blobs/tags. > > What could we do better next time: > ---------------------------------- > - Have backups of all repos! > - Compare the age of the backup immediately, and consider going live > with the backup. Only 5 hours of work would have been lost, and even > then possibly only temporarily, due to the distributed nature of Git. > - More people need to use the infra-status page to learn about the state > of Gentoo services. > > Actions for Infra > ----------------- > - Include dev/ repos were not in the backup > - Set up Gitolite mirroring > - Review gitolite logging (needs to be easier to confirm when writes > took place) > > -- > Robin Hugh Johnson > Gentoo Linux: Developer, Infrastructure Lead > E-Mail : [email protected] > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 >
