Hello, The issue is not seen when the driver from support.dell.com (version 1.8.7b)is used. If this driver is used, there is no need to pass disable_msi=1.
RHEL 5.3 native driver should be loaded with disable_msi=1 to not see the issue. With regards, Narendra K >-----Original Message----- >From: Ryan Pugatch [mailto:[email protected]] >Sent: Thursday, September 03, 2009 8:48 PM >To: K, Narendra >Cc: [email protected]; linux-poweredge-Lists >Subject: Re: FW: T410 Network Failure > >Is this issue fixed by using the new driver from >support.dell.com and NOT having disable_msi? > >Thanks, > >Ryan > >[email protected] wrote: >> Hello, >> >> Yes, there might not be a link down message, everytime this issue is >> seen.In a failed state, you cannot ping to the system and you cannot >> ping from the system.With disable_msi=1 we have not seen the issue. >> When the issue occurs, except that the system becomes unreachable, >> there might not be any logs in dmesg or syslog. Issue is not >seen with >> upstream kernel. Dell and RedHat are working on this, and we should >> know soon, what is going on. >> >> With regards, >> Narendra K >> >>> -----Original Message----- >>> From: Ryan Pugatch [mailto:[email protected]] >>> Sent: Wednesday, September 02, 2009 2:33 AM >>> To: K, Narendra >>> Cc: [email protected]; linux-poweredge-Lists >>> Subject: Re: FW: T410 Network Failure >>> >>> (sorry, resent as I sent from wrong email originally) >>> >>> Just checked logs again and the copper link down message hasn't >>> happened every time there was a problem, so that may not be related. >>> >>> Ryan >>> >>> >>> Ryan Pugatch wrote: >>>> FWIW, I am also having the same issue with some R710's. >They are a >>>> part of a hadoop cluster. Interestingly enough, so far only >>> 2 out of >>>> the 3 servers have experienced the issue thus far in that >>> cluster. We >>>> also run our corporate mail server on an R710 and that has >not shown >>>> any problems yet (except for a weird issue where outgoing TCP >>>> connections would intermittently fail until we restarted the >>> network interfaces.. >>>> not sure if this is related--has only happened once). >>>> >>>> We are running CentOS 5.3. All three hadoop machines are running >>>> 2.6.18-128.2.1.el5 and the mail server is running >>>> 2.6.18-128.1.10.el5 >>>> >>>> It seems that when the network would drop it would log: >>>> >>>> kernel: bnx2: eth0 NIC Copper Link is Down >>>> >>>> Not sure that the disable_msi option will fix the two hadoop >>> machines >>>> having the issues as the problem happens somewhat randomly and not >>>> easily reproducible. That being said, we aren't getting >>> some network >>>> related errors in our hadoop logs that we had been getting >>> previously >>>> so I suspect that is a good sign. Time will tell! >>>> >>>> Is this issue related to the 2.6.28-rc3 regression specified here? >>>> http://lkml.indiana.edu/hypermail/linux/kernel/0811.0/01374.html >>>> >>>> I am hoping a fix will make its way to RHEL and downstream >to CentOS >>>> (has anyone heard if that is happening? I'm having trouble >>> finding a >>>> redhat or centos bug logged). >>>> >>>> Are there any performance concerns with using disable_msi? I know >>>> that the driver from Dell.com should fix the problem but I'd >>> prefer to >>>> use a driver provided from upstream. >>>> >>>> Ryan Pugatch >>>> Systems Administrator, TripAdvisor >>>> >>>> >>>> [email protected] wrote: >>>>> Hello, >>>>> >>>>> Thanks, this info is of great help. >>>>> >>>>> With regards, >>>>> Narendra K >>>>> >>>>> -----Original Message----- >>>>> From: daryl herzmann [mailto:[email protected]] >>>>> Sent: Thursday, August 13, 2009 7:07 PM >>>>> To: K, Narendra >>>>> Cc: Biligiri, Raghavendra; linux-poweredge-Lists >>>>> Subject: RE: FW: T410 Network Failure >>>>> >>>>> On Thu, 13 Aug 2009, [email protected] wrote: >>>>> >>>>>> Thanks. Top output need not be at the time of failure. It >>> can be any >>>>>> time, just to get an idea as to what is resource >>> utilization so that >>>>>> we can replicate it. And general high level detail about the >>>>>> database you are using - like is it a oracle database ? >>>>> It is running PostgreSQL 8.4 . sar reports that the average CPU >>>>> utilization for today is 0.44% . 10% of memory is used. network >>>>> utilization is only a few kbps. I suspect when the >>> failures occured, >>>>> the machine got hit with a few hundred postgresql connections at >>>>> once, but I have no way to prove it. >>>>> >>>>> sorry again, >>>>> daryl >>>>> >>>>> _______________________________________________ >>>>> Linux-PowerEdge mailing list >>>>> [email protected] >>>>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge >>>>> Please read the FAQ at http://lists.us.dell.com/faq >>>> _______________________________________________ >>>> Linux-PowerEdge mailing list >>>> [email protected] >>>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge >>>> Please read the FAQ at http://lists.us.dell.com/faq >>> > > _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
