> From: [email protected] [mailto:discuss- > [email protected]] On Behalf Of Lawrence K. Chen, P.Eng. > > Good thing...I'd hate to end the uptime streak that this server has....it had > been up 2525 days.
I've said many times before, that you shouldn't be proud of your uptime, because it means you're not applying updates, so you're exposing yourself to bugs & vulnerabilities. I understand sometimes systems run in a protected environment where that's not much of a concern. But there are 2-3 other reasons which you've demonstrated: "Why does it work," to login to console instead of ssh, when a system is out of disk or memory. Well, often times, it doesn't. So you were lucky it worked for you this time. The same thing that prevents sshd from spawning another process sometimes prevents bash, or ls, or kill, or rm from spawning or otherwise working properly. But if you're patient enough, if you retry enough times, and/or if you're lucky, sometimes it works. When a system is out of memory or out of disk, the processes it runs have a tendency to become unstable, because the "out of memory" errors and "IO" errors that hit the running processes are somewhat inconsistent, as tiny little chunks of memory and disk get freed by other processes or otherwise reclaimed by kernel, sometimes the running processes will succeed and sometimes not. Until a few minutes elapse and the system is likely completely wedged. After you alleviate the cause of the problem, in order to assure system stability, you really *should* reboot. Which brings up the second point. If you never reboot, then you don't know if your system is able to survive a reboot. I can't describe how many times, or how much frustration I've had to endure, by inheriting systems from former admins (usually fired) who never rebooted. It's painfully common that there is no documentation of which daemons need to be running, or what system dependencies exist (first boot machine #1 and ensure the "foobar" service is available before booting machine #2, etc), nor how they were started in the first place. If I'm lucky enough to look at the system while it's operable, I run 'ps' and stuff, to make an inventory of what services are running. I login as root and run "history" and start reading for clues. I see things like "nohup java blah blah" and "screen this and that" and ... You get the point. Makes me want to fire the person again in front of a firing squad. Undocumented manually launched services running in production. Brilliant. Hopefully you at least have none of those. _______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
