Joyent operations guy here; no, it wasn't me that hit the Big Red Button. (Unrelatedly, tomorrow is my last day there, so... I can say nice things about my team, right?)
Internal culture at Joyent is pretty damn responsible, especially on the sysadmin side of the house. Don't f*ck with my coworkers, or I will end you. We've got each other's backs, in the best of ways. The postmortem posted is quite specific and accurate -- the outage was caused by a fairly complex sociotechnical situation, and some outright code bugs, which are now being addressed. Recovery time was incredibly short given the difficult nature of the problems encountered. There's some relevant-to-this-group hiring going on at Joyent -- do pop a resume in if you're looking for new opportunities. best, --e On Thu, May 29, 2014 at 10:55 AM, Moose Finklestein <[email protected]> wrote: > Oh, yes, we've all been there. Typed 'reboot' in the wrong window. Done > 'newfs' on the wrong dev. Told someone, "Go press the alarm button" only to > watch in horror as they push the EPO. Oh, yeah. > > The best part of this tale, I think, is that the company's attitude of > "Well, the person screwed up and knows it; we don't see any need to beat > them further than they're beating themself." It's a refreshing and > intelligent change from the typical "Of course we fired the person who did > this!" that comes with a public disaster. > > > > _______________________________________________ > Discuss mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > This list provided by the League of Professional System Administrators > http://lopsa.org/ > _______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
