We are also seeing something similiar on Dell 1750's running recent 2.4 kernels. Our monitoring software shows that the servers run for 1-6 months straight w/o problem and then suddenly allocate all memory to something and hang, forcing us to use the DRAC's to reboot.
This started happening about 9 months ago, before that the same hardware ran for up to 1YR at a time w/o probs. It also seems to happen more often on apache/php/mysql systems. Postfix and tomcat boxes running on the same hardware and kernels have no problems (~300 days current uptime on a few). Regards, Matt --- Original Message--- To: [email protected] From: Sean Cook <[EMAIL PROTECTED]> Sent: 4/24/2005 1:30PM Subject: Re: [gentoo-server] Server lockups (still ping) (OT because not Gentoo-specific?) >> Is it a dell 1550 by any chance? >> >> On Sun, 2005-04-24 at 10:43 -0400, Robert Sanders wrote: >> > Casey, >> > >> > We've been seeing issues like this for probably the last year. I was >> > never able to pinpoint it to any action. We implemented remote reboot >> > hardware and called it a day. >> > >> > Some of them had strange activity, but over a larger group of machines I >> > could never find a pattern to it. It almost seems as if it cannot spawn >> > any new processes. >> > >> > I can't help except to say your not alone. >> > >> > Rob >> > >> > Casey Allen Shobe - SeattleServer Mailing Lists wrote: >> > > Hey all, >> > > >> > > We're seeing occasional issues with a bunch of machines we have in a >> > > datacenter, most of which are currently running Gentoo. The machines >> > > will >> > > run solid and fine for days, weeks, even months, and then just lock up >> > > solid >> > > - the box still pings and an nmap scan shows all the normal ports open, >> > > but >> > > nothing responds on any port, nothing shows up in system logs, and the >> > > times >> > > we've had console access to a machine at the time, a login prompt would >> > > show >> > > up, but it would just hang if you tried to log in. >> > > >> > > This generally indicates hardware issues to me, but it has been happening >> > > across a wide array of both well-tested and new machines. In addition, >> > > it >> > > happens on machines that are running Red Hat 7.1 through 9.0 as well as >> > > Gentoo. The problem seems random, and there is almost always close to >> > > zero >> > > load on the machine when it locks up (only once were we presently using >> > > the >> > > machine, and it locked up while uncompressing a tar file). >> > > >> > > The Gentoo systems use the deadline I/O scheduler as it's deemed the most >> > > reliable, but this has shown up with the default anticipatory I/O >> > > scheduler >> > > as well. >> > > >> > > The only common factor seems to be that they are all plugged into a >> > > questionable HP Procurve switch that we've been contemplating replacing. >> > > Would that simply be wasting our time (I don't think a buggy switch >> > > should be >> > > able to lock up boxes...)? Any recommendations for what to investigate >> > > at >> > > this point? >> > > >> > > Cheers, >> > >> >> -- >> [email protected] mailing list >> >> >> -- [email protected] mailing list
