We are also seeing something similiar on Dell 1750's running
recent 2.4 kernels.  Our monitoring software shows that the servers
run for 1-6 months straight w/o problem and then suddenly allocate all
memory to something and hang, forcing us to use the DRAC's to 
reboot.

This started happening about 9 months ago, before that the same hardware
ran for up to 1YR at a time w/o probs.  It also seems to happen more often
on apache/php/mysql systems.  Postfix and tomcat boxes running on the same
hardware and kernels have no problems (~300 days current uptime on a few).

Regards,
Matt

--- Original Message---
 To: [email protected]
 From: Sean Cook <[EMAIL PROTECTED]>
 Sent:  4/24/2005  1:30PM
 Subject: Re: [gentoo-server] Server lockups (still ping) (OT because not 
Gentoo-specific?)

>> Is it a dell 1550 by any chance?
>> 
>> On Sun, 2005-04-24 at 10:43 -0400, Robert Sanders wrote:
>> > Casey,
>> >
>> > We've been seeing issues like this for probably the last year.  I was
>> > never able to pinpoint it to any action.  We implemented remote reboot
>> > hardware and called it a day.
>> >
>> > Some of them had strange activity, but over a larger group of machines I
>> > could never find a pattern to it.  It almost seems as if it cannot spawn
>> > any new processes.
>> >
>> > I can't help except to say your not alone.
>> >
>> > Rob
>> >
>> > Casey Allen Shobe - SeattleServer Mailing Lists wrote:
>> > > Hey all,
>> > >
>> > > We're seeing occasional issues with a bunch of machines we have in a
>> > > datacenter, most of which are currently running Gentoo.  The machines 
>> > > will
>> > > run solid and fine for days, weeks, even months, and then just lock up 
>> > > solid
>> > > - the box still pings and an nmap scan shows all the normal ports open, 
>> > > but
>> > > nothing responds on any port, nothing shows up in system logs, and the 
>> > > times
>> > > we've had console access to a machine at the time, a login prompt would 
>> > > show
>> > > up, but it would just hang if you tried to log in.
>> > >
>> > > This generally indicates hardware issues to me, but it has been happening
>> > > across a wide array of both well-tested and new machines.  In addition, 
>> > > it
>> > > happens on machines that are running Red Hat 7.1 through 9.0 as well as
>> > > Gentoo.  The problem seems random, and there is almost always close to 
>> > > zero
>> > > load on the machine when it locks up (only once were we presently using 
>> > > the
>> > > machine, and it locked up while uncompressing a tar file).
>> > >
>> > > The Gentoo systems use the deadline I/O scheduler as it's deemed the most
>> > > reliable, but this has shown up with the default anticipatory I/O 
>> > > scheduler
>> > > as well.
>> > >
>> > > The only common factor seems to be that they are all plugged into a
>> > > questionable HP Procurve switch that we've been contemplating replacing.
>> > > Would that simply be wasting our time (I don't think a buggy switch 
>> > > should be
>> > > able to lock up boxes...)?  Any recommendations for what to investigate 
>> > > at
>> > > this point?
>> > >
>> > > Cheers,
>> >
>> 
>> --
>> [email protected] mailing list
>> 
>> 
>> 


-- 
[email protected] mailing list

Reply via email to