I made the same experience with a heavy loaded nntp-server. We have to reboot the system after about 10-12 weeks. Looks like a memory leak in the kernel (2.4.7).
Robert, who has a machine running solaris 2.5.1 with an uptime of 1386 (!) days ;-) ----- Original Message ----- From: "David Boyes" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, April 23, 2002 6:28 PM Subject: Re: MTBF > > > The record for us is about 9 months for a single Linux image. Average > > > is about 3-4 months between reboots, depending on what's running in > > > them -- things that suck up lots of memory like Websphere tend to > > > shorten the lifespan of the machine by fragmenting storage. Machines > > > that get a lot of interactive use tend to collect a few zombies after > > > a while, so reboots become a reasonably good idea after a while. > > > > I have to say that I'm a little surprised at that recommendation. > > No, THIS IS NOT A RECOMMENDATION. This is a descriptive observation. > > The failures we see appear to be memory related, and there are some cases > where if you cut interactive users loose and let them do their stuff, they > create random garbage, et al. This is pretty standard stuff for lots of > interactive processing sites -- clear the decks periodically even if it's > not sick. > > > Seems like I've heard lots of tales of people with Linux up > > much longer than 9 months... doing web services, etc... do you > > think your 9 month figure is a function of the 390 version > > of Linux, or Linux in general? > > No, I think its a function of how we make upgrade decisions and/or ops > policy. I suspect that you could go longer, but I wanted to share a data > point. > > > That is, would you recommend > > rebooting a PC version of Linux on the same interval (given > > the same workload?) > > Given my workload, probably. My users are rude, cranky, and badly behaved. > They can break anything...8-). > > > Also - just to ask - what about the BSD variants - would you > > also recommend 9 months for them? > > See above. If the users do stupid things, you're about in the same position > no matter what the OS. > > > Could you relate more about this 9 month figure? Do you have > > specific instances where it was required, that you can share > > of course... > > The 9 month one was a power failure on site with a P390 doing mail delivery. > The failure was that the customer was too ... funds limited... to go for > backup setups. Restart, and we were up and running, but that's the maximum > runtime we've observed. > > Sorry if the comment was confusing. I don't intend it to be a > recommendation, just an observation of our experience. > > -- db >
