(sorry, resent as I sent from wrong email originally) Just checked logs again and the copper link down message hasn't happened every time there was a problem, so that may not be related.
Ryan Ryan Pugatch wrote: > FWIW, I am also having the same issue with some R710's. They are a part > of a hadoop cluster. Interestingly enough, so far only 2 out of the 3 > servers have experienced the issue thus far in that cluster. We also > run our corporate mail server on an R710 and that has not shown any > problems yet (except for a weird issue where outgoing TCP connections > would intermittently fail until we restarted the network interfaces.. > not sure if this is related--has only happened once). > > We are running CentOS 5.3. All three hadoop machines are running > 2.6.18-128.2.1.el5 and the mail server is running 2.6.18-128.1.10.el5 > > It seems that when the network would drop it would log: > > kernel: bnx2: eth0 NIC Copper Link is Down > > Not sure that the disable_msi option will fix the two hadoop machines > having the issues as the problem happens somewhat randomly and not > easily reproducible. That being said, we aren't getting some network > related errors in our hadoop logs that we had been getting previously so > I suspect that is a good sign. Time will tell! > > Is this issue related to the 2.6.28-rc3 regression specified here? > http://lkml.indiana.edu/hypermail/linux/kernel/0811.0/01374.html > > I am hoping a fix will make its way to RHEL and downstream to CentOS > (has anyone heard if that is happening? I'm having trouble finding a > redhat or centos bug logged). > > Are there any performance concerns with using disable_msi? I know that > the driver from Dell.com should fix the problem but I'd prefer to use a > driver provided from upstream. > > Ryan Pugatch > Systems Administrator, TripAdvisor > > > [email protected] wrote: >> Hello, >> >> Thanks, this info is of great help. >> >> With regards, >> Narendra K >> >> -----Original Message----- >> From: daryl herzmann [mailto:[email protected]] >> Sent: Thursday, August 13, 2009 7:07 PM >> To: K, Narendra >> Cc: Biligiri, Raghavendra; linux-poweredge-Lists >> Subject: RE: FW: T410 Network Failure >> >> On Thu, 13 Aug 2009, [email protected] wrote: >> >>> Thanks. Top output need not be at the time of failure. It can be any >>> time, just to get an idea as to what is resource utilization so that >>> we can replicate it. And general high level detail about the database >>> you are using - like is it a oracle database ? >> It is running PostgreSQL 8.4 . sar reports that the average CPU >> utilization for today is 0.44% . 10% of memory is used. network >> utilization is only a few kbps. I suspect when the failures occured, >> the machine got hit with a few hundred postgresql connections at once, >> but I have no way to prove it. >> >> sorry again, >> daryl >> >> _______________________________________________ >> Linux-PowerEdge mailing list >> [email protected] >> https://lists.us.dell.com/mailman/listinfo/linux-poweredge >> Please read the FAQ at http://lists.us.dell.com/faq > > _______________________________________________ > Linux-PowerEdge mailing list > [email protected] > https://lists.us.dell.com/mailman/listinfo/linux-poweredge > Please read the FAQ at http://lists.us.dell.com/faq _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
