Hi everyone, As you may have noticed, yesterday was marked by rather consistent CI failures. I believe I have now fixed the root cause but do let me know if you see any further issues.
Below I have included a brief post-mortem describing the incident and what we plan to do to prevent it recurring in the future. Cheers, - Ben # Post-mortem Early Friday morning our storage provider experienced a hiccup which rendered the volume which backed our GitLab repositories unavailable for a few minutes. The interruption was long enough that the filesystem remounted as read-only. This caused a bit of filesystem damage affecting a handful of objects in the ghc/perf-notes repository. This resulted in CI failures when attempts to `git push` to this repository failed. To address this I started by ensuring we had an up-to-date backup of our data and began sorting through the various observed errors. Once I had established that the storage volume had been interrupted I went looking for additional corruption. A integrity check of all GitLab repositories revealed no further corruption. This isn't terribly surprising given that the perf notes repository sees the most commit traffic of all repositories hosted on GitLab. While it would likely be possible to recover the corrupted perf-notes objects from clones on the CI builders, I deemed this to be not worth the effort given that these commits hold replaceable performance metric data and the last good commit appears to have been produced mere hours prior to the corrupted HEAD commit. Consequently, I rather reverted the ghc-notes repository to the last-known-good commit (8154013bfdce86fedf2863cb96ccbb723f1144f8). # Planned changes for future mitigation While this incident didn't result in any significant data loss, it was nevertheless a significant headache and resulted in CI failing for the better part of a day. Moreover, this isn't the first time that the network block storage backing GitLab has let us down. At this point, I have lost confidence that network block storage can be trusted. For this reason, in the future we will eliminate this failure mode by adjusting our deployment to avoid relying on network block storage for any deployment-critical data. We hope that we will be able to carry out this change in the coming weeks.
signature.asc
Description: PGP signature
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs