On Fri, Jan 17, 2014 at 9:02 PM, Robin H. Johnson <[email protected]>wrote:

> overlays.gentoo.org service has been restored on a new system.
> Some statistics and a post-mortem follow.
>
> Special thanks to antarus and a3li for all their interactions with our
> sponsor,
> and managing most of the details. I just did the final data recovery and
> this
> writeup.
>
> Please resume using the service, and if you see something weird that you
> think is different from before, please file a bug for Infrastructure.
>
> In the process, the service moved to a new machine. The SSH keys have
> changed
> as follows:
> DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5
> RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9
> ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8
>
> As this time, we will NOT be restoring Trac due to low demand. If you
> still require an web-based SVN browser for old SVN repos, please contact
> us at [email protected].
>

For Trac wiki users. The recommendation is to move to wiki.gentoo.org. If
you hadn't migrated, and you need a copy of your Trac wiki pages from
overlays.gentoo.org, please file a bug against infra and someone (me) will
restore them for on a request by request basis. I think the deal is that I
can pretty trivially give you a tarball of markup files (one per wiki page.)

-A


>
> If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push
> to the server again.
>
> IMPORTANT: The following repos were damaged beyond repair, and were not
> available in backups. You'll need to push again, I have reset the repos to
> empty:
> dev/anarchy.git
> dev/dberkholz.git
> dev/dev-zero.git
> dev/dilfridge.git
> dev/fordfrog.git
> dev/graaff.git
> dev/maekke.git
> dev/mschiff.git
> dev/quantumsummers.git
> dev/zorry.git
>
> FYI: The following repos appeared to be empty:
> dev/b33fc0d3.git
> dev/moult.git
> dev/tomwij.git
> user/blueicefield.git
> user/disinbox.git
> user/palatis.git
> user/paragon.git
> user/vmalov.git
> user/xray.git
>
> FYI: The following repos contained dangling commits/tags/blobs, and this
> should not be considered new breakage; if you have a newer copy, you are
> encouraged to push again:
> dev/blueness.git
> dev/maksbotan.git
> dev/mgorny.git
> dev/qiaomuf.git
> dev/xmw.git
> proj/betagarden.git
> proj/catalyst.git (+tags)
> proj/devmanual.git
> proj/dotnet.git
> proj/elfix.git (+tags)
> proj/emacs-tools.git
> proj/gamerlay.git
> proj/hardened-dev.git
> proj/hardened-patchset.git
> proj/kde.git
> proj/lisp.git
> proj/openrc.git (+tags)
> proj/portage.git
> proj/ruby-overlay.git
> proj/sci.git
> proj/sunrise.git
> proj/webapp-config.git
> proj/x11.git
> user/gmt.git
> user/mv.git (+blobs)
> user/palmer.git
>
> Statistics:
> -----------
>   354 repos total
> -  10 repos unrecoverable (all in /dev)
> = 344 repos recovered/available
>
>     9 repos that seem to empty
>    26 repos with dangling commits/tags/blobs
>     2 repos recovered from external sources.
>
> Breakdown by path:
> ------------------
> 193 proj/ repos
>  69 dev/  repos
>  91 user/ repos
>   1 other repo
>
> Post-mortem
> -----------
> Hornbill went offline around: 2014-01-10 13:13 UTC
> Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC
> Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC
>
> Between the backup starting, and the server going offline, we were able
> to confirm writes to the following Git repos:
> dev/fordfrog.git
> proj/kde.git
> gitolite-admin.git
>
> We believe that there were no writes to user/ repos, but are not 100%
> certain, as the logging was insufficient for this purpose.
>
> Hornbill went offline just over a week ago: Mid-afternoon on a Friday
> for the timezone where it's located. Due staff turnover and business
> changes at the previous sponsor, we were not able to contact anybody
> until regular office hours on Monday, January 13th.
>
> The server in question, while previously functioning, was not
> recoverable after a remote hands reboot on Monday afternoon (UTC).
> On Tuesday, more the sponsor was able to examine in it more depth, and
> it was not recoverable. More concealingly, it turned out to be one of
> the few remaining Gentoo infrastructure systems with IDE drives. The
> data was recovered, however it seemed to have a lot of corruption.
>
> It was noted that our backups were missing all of the dev/ repos, due to
> a system-wide rule to exclude /dev/ from backups (the rule should only
> be the real /dev, not any directory simply named "dev"). For this
> reason, we decided to try and get the data from the old server.
>
> Verification/recovery of the remaining data was also hampered by
> confirming that some of the Git repos in the backup were not entirely
> clean, containing legacy errors that turned out to be false positives
> from their CVS/SVN conversions, or dangling commits/blobs/tags.
>
> What could we do better next time:
> ----------------------------------
> - Have backups of all repos!
> - Compare the age of the backup immediately, and consider going live
>   with the backup. Only 5 hours of work would have been lost, and even
>   then possibly only temporarily, due to the distributed nature of Git.
> - More people need to use the infra-status page to learn about the state
>   of Gentoo services.
>
> Actions for Infra
> -----------------
> - Include dev/ repos were not in the backup
> - Set up Gitolite mirroring
> - Review gitolite logging (needs to be easier to confirm when writes
>   took place)
>
> --
> Robin Hugh Johnson
> Gentoo Linux: Developer, Infrastructure Lead
> E-Mail     : [email protected]
> GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
>

Reply via email to