Hey all,

We're seeing occasional issues with a bunch of machines we have in a 
datacenter, most of which are currently running Gentoo.  The machines will 
run solid and fine for days, weeks, even months, and then just lock up solid 
- the box still pings and an nmap scan shows all the normal ports open, but 
nothing responds on any port, nothing shows up in system logs, and the times 
we've had console access to a machine at the time, a login prompt would show 
up, but it would just hang if you tried to log in.

This generally indicates hardware issues to me, but it has been happening 
across a wide array of both well-tested and new machines.  In addition, it 
happens on machines that are running Red Hat 7.1 through 9.0 as well as 
Gentoo.  The problem seems random, and there is almost always close to zero 
load on the machine when it locks up (only once were we presently using the 
machine, and it locked up while uncompressing a tar file).

The Gentoo systems use the deadline I/O scheduler as it's deemed the most 
reliable, but this has shown up with the default anticipatory I/O scheduler 
as well.

The only common factor seems to be that they are all plugged into a 
questionable HP Procurve switch that we've been contemplating replacing.  
Would that simply be wasting our time (I don't think a buggy switch should be 
able to lock up boxes...)?  Any recommendations for what to investigate at 
this point?

Cheers,
-- 
Casey Allen Shobe | SeattleServer, Inc.
[EMAIL PROTECTED] | cell 425-443-4653
http://www.seattleserver.com
-- 
[email protected] mailing list

Reply via email to