Re: [lopsa-discuss] Knowledge lost in time....

Lawrence K. Chen, P.Eng. Mon, 24 Jun 2013 16:45:50 -0700

----- Original Message -----
> > From: [email protected] [mailto:discuss-
> > [email protected]] On Behalf Of Lawrence K. Chen, P.Eng.
> > 
> > Good thing...I'd hate to end the uptime streak that this server
> > has....it had
> > been up 2525 days.
> 
> I've said many times before, that you shouldn't be proud of your
> uptime, because it means you're not applying updates, so you're
> exposing yourself to bugs & vulnerabilities.  I understand sometimes
> systems run in a protected environment where that's not much of a
> concern.  But there are 2-3 other reasons which you've demonstrated:
>


Yes, I have often argued such things, but since scheduling downtime to patch a 
server requires signoff by a dozen or so groups, and there's no guarantee that 
the patch won't break whatever equally ancient application that's on the 
box....approvals rarely happen.

Back when we had assigned areas of responsibilities...I used to tell the people 
the patching had to occur during the biannual patch weeks. (the week between 
christmas and new years, and a quiet week in July...though that quiet week had 
disappeared when I came up to my first one...)  When that area got assigned to 
another SA, the patching of those servers stopped.  Largely because I was the 
only one left that had been pushing systems to get patched.  The other SA that 
was for it, had quit some time ago.  But, I came from an environment where 
regular patching has saved my systems when the rest of the company got knocked 
out.  Which we would learn about after company Internet and phone service is 
restored....and the messages of the impending outage finally reach us.  They 
would talk about losing 10's millions due to the worm, but I could've been 
working during the time...if I didn't have to go around and cleanup other 
people's computers.

But my hate to end the uptime streak of 2525+ days....is that the reboot would 
likely go one of two ways.  It works, because nothing has changed much 
configuration wise, and no bits have rotted, during the time.  Or it disappears.

I once lost our datacenter DNS server by patching it....and rebooting it.  It 
has been set up and managed by now former SA...where it was revealed that he 
had repurposed a imap dev server and only making command line changes and 
running out of tmp space..... so it came up as the original purpose with 
completely no trace of its DNS server existence.  And, tmp space isn't backed 
up by our backup system. (almost caught another group, because they were doing 
daily database backups in to tmp...expecting that our night system backup would 
pick it up....fortunately, I spotted what they were doing and had them back up 
somewhere else....where a week later an application update failed and they 
needed to do a restore.....though it wasn't a total save, because they had been 
doing tar of the database while its running backups....)

I spent the rest of the evening/night building a new datacenter DNS server from 
scratch.....later when I talked to him on IRC, he said something about the real 
hardware for that server needed servicing so he had done that system as 
temporary, and I ended an almost 2.5 year uptime for that server.

And, there have been lots of other horror stories in our datacenter along these 
lines.

Meanwhile, I'm getting a chuckle of all the people freaking out about Java6 
being EOL'd....they have servers here and servers there that need that, and 
probably won't work with java7.  But, most of the critical servers 
mentioned....are on hardware so old they aren't on support, running an OS that 
has never been patched and is also into EOL (extended support is available for 
extra bucks, but since the hardware isn't on support, neither is the OS.)  Of 
course, we've tried to get them to upgrade.  But, given the state of their 
systems...the end of java updates has no impact.

I think about the oldest in production is a Solaris 8 box and an RHEL 2.1 
box....

Up until shortly after the DST change we still had some Solaris 2.6 servers in 
production, along with a SunOS 4.1.3 box. (since I remember having to figure 
out the tz tools on the respective systems to make them handle the change)

Though lately my $boss has been talking about building out new server 
architectures...where the applications people have been told up front that they 
have to expect that a node can and will be take out from under them, for 
patching, without notice without interruption to their users.  And, to also 
expect that it will be newer when it reappears and that in a week all the other 
nodes will be that level, too.

Its going as if its what they wanted all along... though he doesn't think other 
groups will be up for that kind of thing....

Oddly, I had been telling him about such ideas for years (after coming back 
from a couple of LISA's)....

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Senior Unix Systems Administrator
For: Enterprise Server Technologies (EST) -- & SafeZone Ally
Snail: Computing and Telecommunications Services (CTS)
Kansas State University, 109 East Stadium, Manhattan, KS 66506-3102
Phone: (785) 532-4916 - Fax: (785) 532-3515 - Email: [email protected]
Web: http://www-personal.ksu.edu/~lkchen - Where: 11 Hale Library
_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] Knowledge lost in time....

Reply via email to