Apparently, though unproven, at 00:49 on Wednesday 03 November 2010, walt did 
opine thusly:

> On 11/02/2010 03:05 PM, Alan McKinnon wrote:
> > Right now I sit with 60+ SLES 9 machines that cannot be taken offline for
> > any reason, and EVERY SINGLE ONE has one giant filesystem...
> > 
> > How did this happen? The man in charge three managers ago thought this
> > was a cool way to configure critical servers. Because "One filesystem
> > mounted at /" was option #1 on the disk page of the SLES install wizard.
> 
> Thanks, I'm relieved to know that I'm not cut from managerial cloth :)
> 
> I'm assuming that SUSE releases security patches from time to time.  How
> do you keep all those machines up to date if you can't take them offline?


Maintenance time slots. A reboot after installing a new kernel takes less than 
5 minutes and nothing else really requires a reboot, so this passes the Change 
Management process easily. Other updates are usually a service restart which 
can be done on the fly. So "never take offline" doesn't actually mean *never*, 
it means "outside agreed service levels"

Fixing / means take the machine offline for X hours where $X is some large 
number depending on how big / is and how fast tar runs. And the Change Manager 
asks his usual horrible questions:

What's the risk?
What's the impact?
Is this customer facing?
Does this problem reduce quality of service to customers?
Don't you have Nagios to manage exactly this kind of thing?

His answer to my answers is usually something like 

"You're kidding me right? This is another one of Alan's pranks, right?"

-- 
alan dot mckinnon at gmail dot com

Reply via email to