On Sun, 27 Jul 2003 01:30:11 PDT, Neeko Oni <[EMAIL PROTECTED]> said:
> (Oh, and why aren't those 70-80% patched at SP4 with RPC firewalled?) Umm... because there's a *lot* of PC's that are sitting on people's kitchen tables without a sysadmin who knows what SP4 is? > > You've obviously never administered a network with 10,000 servers. > Typical. "Waahh, I can't be expected to firewall, patch, or otherwise protect > my machines! When will I have time to play Counter-Strike and leech porn?" > You aren't /actually/ paid to play CS and download pornography, sir. That's > why you have to take classes in buzzwordology, remember? OK. I admit it. Our machine room is only a shade over a quarter acre (12,000 square feet or so), and there's closer to 1,000 machines than 10K in it. You'd have to count the 30K desktop boxes to get it to the 10K criterion (and due to political considerations, a large fraction of those 30K are a lot less under our control than we might want - it isn't like we can just use group policy to push patches...) As it is, we *STILL* get hosed by interdependencies and similar issues when scheduling semi-emergency downtime. You want to put a patch on the mail server? Well, that's all fine and good, but if you don't allow enough time for everybody to get the "the server will be down" message, the phone will be ringing off the hook down in user services. So you need to schedule more people to answer the phone, so *real* problems don't drop through the cracks while you're busy saying "Yes, the mail server is down, didn't you read the e-mail announcement we sent 5 mins ago?" 70,000 times... Oh, and that server over there? You have 3 2-hour windows per week to install patches, and by the way you can't use the Thursday slot because a hardware failure screwed up the production schedule and if you down the box Thursday you won't get paid Friday. And your test slot is 4AM to 6AM, and if the box isn't back up and running perfectly at 6:01, somebody will be royally peeved, so if the install, test, and backout procedures total more than 2 hours, you are in deep shit if something goes wrong.... Oh, and those 4 machines over there? You can't take that database down during the next 3 test windows because that OTHER development group needs them up and running for their testing that has a "must complete" status because if you don't get the changes the external auditors asked for done by Aug 1, somebody's getting fined $10K/day till it's fixed (funny how legal requirements can motivate you.. ;) You can wave your arms all you want and say "Well then, get a policy that says that the users can go masturbate themselves, you're putting the patch on anyhow". However, the *political reality* is that usually there will be at least one person at the VP level who (a) doesn't understand security and (b) intends to have your testicles on a platter because your short-notice downtime screwed up something on his radar that he considered important. You get to kiss this VP's ass or find another job (good luck in this economy). And oh yeah... you get to *test* the frikking patches before you deploy them because you don't trust them. Download them, install them on your 27 different sacrificial-lamb test servers and beat the hell out of them for a few days, in case there's a problem. Why? Because if you install a bad patch and 3 nights later the backups fail to run and the machine crashes instead, losing 2 day's work, that VP from the previous paragraph is going to come looking for you.... A little over a week ago, we had an accidental Halon dump when some construction workers screwed up some wiring. The contractor turned a bit paler when he found out that they were getting billed for $50K in Halon replacement. He turned a LOT paler when he found out he was going to get billed for the sysadmin time to bring all the servers back up. He just about fainted when he found out the bill was *also* going to include all the people who were idled for 4 hours because all the databases they need to do their work were down. And *THAT* part of the bill came to over $200K. And that, my friends is the *real* problem with upgrading 10K servers - even if 9,500 of them are nice clean "install and reboot", that still leaves you 500 problem servers that some VP will come looking for you holding a dull knife and a bottle of steak sauce....
pgp00000.pgp
Description: PGP signature
