Hey all, We're seeing occasional issues with a bunch of machines we have in a datacenter, most of which are currently running Gentoo. The machines will run solid and fine for days, weeks, even months, and then just lock up solid - the box still pings and an nmap scan shows all the normal ports open, but nothing responds on any port, nothing shows up in system logs, and the times we've had console access to a machine at the time, a login prompt would show up, but it would just hang if you tried to log in.
This generally indicates hardware issues to me, but it has been happening across a wide array of both well-tested and new machines. In addition, it happens on machines that are running Red Hat 7.1 through 9.0 as well as Gentoo. The problem seems random, and there is almost always close to zero load on the machine when it locks up (only once were we presently using the machine, and it locked up while uncompressing a tar file). The Gentoo systems use the deadline I/O scheduler as it's deemed the most reliable, but this has shown up with the default anticipatory I/O scheduler as well. The only common factor seems to be that they are all plugged into a questionable HP Procurve switch that we've been contemplating replacing. Would that simply be wasting our time (I don't think a buggy switch should be able to lock up boxes...)? Any recommendations for what to investigate at this point? Cheers, -- Casey Allen Shobe | SeattleServer, Inc. [EMAIL PROTECTED] | cell 425-443-4653 http://www.seattleserver.com -- [email protected] mailing list
