I made the same experience with a heavy loaded nntp-server. We have to
reboot the system after about 10-12 weeks.  Looks like a memory leak in the
kernel (2.4.7).

Robert,
who has a machine running solaris 2.5.1 with an uptime of 1386 (!) days  ;-)



----- Original Message -----
From: "David Boyes" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, April 23, 2002 6:28 PM
Subject: Re: MTBF


> > > The record for us is about 9 months for a single Linux image. Average
> > > is about 3-4 months between reboots, depending on what's running in
> > > them -- things that suck up lots of memory like Websphere tend to
> > > shorten the lifespan of the machine by fragmenting storage. Machines
> > > that get a lot of interactive use tend to collect a few zombies after
> > > a while, so reboots become a reasonably good idea after a while.
> >
> >  I have to say that I'm a little surprised at that recommendation.
>
> No, THIS IS NOT A  RECOMMENDATION. This is a descriptive observation.
>
> The failures we see appear to be memory related, and there are some cases
> where if you cut interactive users loose and let them do their stuff, they
> create random garbage, et al. This is pretty standard stuff for lots of
> interactive processing sites -- clear the decks periodically even if it's
> not sick.
>
> >  Seems like I've heard lots of tales of people with Linux up
> >  much longer than 9 months... doing web services, etc...  do you
> >  think your 9 month figure is a function of the 390 version
> >  of Linux, or Linux in general?
>
> No, I think its a function of how we make upgrade decisions and/or ops
> policy.  I suspect that you could go longer, but I wanted to share a data
> point.
>
> > That is, would you recommend
> >  rebooting a PC version of Linux on the same interval (given
> >  the same workload?)
>
> Given my workload, probably. My users are rude, cranky, and badly behaved.
> They can break anything...8-).
>
> >  Also - just to ask - what about the BSD variants - would you
> >  also recommend 9 months for them?
>
> See above. If the users do stupid things, you're about in the same
position
> no matter what the OS.
>
> >  Could you relate more about this 9 month figure?  Do you have
> >  specific instances where it was required, that you can share
> >  of course...
>
> The 9 month one was a power failure on site with a P390 doing mail
delivery.
> The failure was that the customer was too ... funds limited... to go for
> backup setups.  Restart, and we were up and running, but that's the
maximum
> runtime we've observed.
>
> Sorry if the comment was confusing.  I don't intend it to be a
> recommendation, just an observation of our experience.
>
> -- db
>

Reply via email to