We recently had a situation where we got overwhelmed with 'noise' on our cron output: things that are meant to be 'silent unless things go wrong' started outputing significant amounts of email. This overwhelmed the folk that track the cron output and a real issue was missed (the librarian-gc script was in crisis).
During the team lead meeting we discussed this and I've clarified our policies with the outcome: Things that *support* our identification of production issues are essential to our day to day operations. Any [significant] disruption to them is now an immediate operational incident. I don't see this as an actual change, rather a formalisation of the prioritisation many folk have had in the past, but formalising it gives *explicit* support to anyone that notices the issue and needs to get the ball rolling. I've updated the various docs I could see that were relevant: https://dev.launchpad.net/BugTriage https://dev.launchpad.net/PolicyAndProcess/ZeroOOPSPolicy https://wiki.canonical.com/Launchpad/PolicyandProcess/DefinitionofCriticalPolicy (sorry, internal only) I'd love any feedback on clarity - or whether this is a crazy thing to do :P - that you might have. -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp