> -----Original Message----- > From: Simon Utting [mailto:[email protected]] > Sent: Thursday, September 13, 2012 2:39 PM > To: [email protected] > Subject: [E1000-devel] Adapter reset on 82576 and 82580 bonded pair > > Hi, > > Apologies if this is missing any information, I will try to be as thorough as > possible. We have hit a wall and are looking for guidance in continuing > troubleshooting, because the driver seems to be resetting the adapter. This is > speculation without a deeper understanding :-) > [..} > - on the majority of machines, at regular, but unpredictable, intervals we see > unresponsive network connectivity from the physical machines (and therefore > obviously the VMs they host) [..}> I appreciate that there will need to be further diagnostic work done to ascertain > the problem. Any guidance is appreciated. > Hello Simon,
I apologize for the delay in responding. Your setup is complicated and I need to consult a few experts for some advice. What version of Xen are you running? What Linux kernel version are you using for Dom0? It seems possible that our interrupts are getting dropped somewhere along the way, possibly by Xen, as our drivers run in Dom0. If this happens, the driver is stuck until something (probably the watchdog) fires the interrupt vector again. Depending upon the timing, this can either result in a short pause, or (if the rings fill up) a spurious TX hang. In this case, it's not really a TX hang, but the ISR gets delayed so long it thinks the hardware is hung when it starts cleaning and sees how old the descriptors are. Things to try: - Make sure you are running latest stable Xen and Dom0, along with our latest driver on everything. - Switch to MSI or legacy interrupts - Could you migrate one of the machine to KVM to see if the problem goes away. I understand this may not be possible, but it would help eliminate Xen from the problem. - Change the watchdog timer to a much shorter interval - maybe 1/10 of second or something like that. This won't eliminate the underlying problem but will make the delays a lot shorter and easier to overlook. If this appears to solve the problem, it's kind of a smoking gun that our interrupts are disappearing. Let me know how it goes. Thanks, Carolyn Carolyn Wyborny Linux Development LAN Access Division Intel Corporation ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
