> -----Original Message-----
> From: Simon Utting [mailto:[email protected]]
> Sent: Thursday, September 13, 2012 2:39 PM
> To: [email protected]
> Subject: [E1000-devel] Adapter reset on 82576 and 82580 bonded pair
> 
> Hi,
> 
> Apologies if this is missing any information, I will try to be as thorough as
> possible. We have hit a wall and are looking for guidance in continuing
> troubleshooting, because the driver seems to be resetting the adapter. This is
> speculation without a deeper understanding :-)
> 
[..}
> - on the majority of machines, at regular, but unpredictable, intervals we see
> unresponsive network connectivity from the physical machines (and therefore
> obviously the VMs they host)
[..}> I appreciate that there will need to be further diagnostic work done to 
ascertain
> the problem. Any guidance is appreciated.
> 
Hello Simon,

I apologize for the delay in responding. Your setup is complicated and I need 
to consult a few experts for some advice.   What version of Xen are you 
running?  What Linux kernel version are you using for Dom0?  It seems possible 
that our interrupts are getting dropped somewhere along the way, possibly by 
Xen, as our drivers run in Dom0. If this happens, the driver is stuck until 
something (probably the watchdog) fires the interrupt vector again. Depending 
upon the timing, this can either result in a short pause, or (if the rings fill 
up) a spurious TX hang. In this case, it's not really a TX hang, but the ISR 
gets delayed so long it thinks the hardware is hung when it starts cleaning and 
sees how old the descriptors are.

Things to try:
- Make sure you are running latest stable Xen and Dom0, along with our latest 
driver on everything.
- Switch to MSI or legacy interrupts
- Could you migrate one of the machine to KVM to see if the problem goes away. 
I understand this may not be possible, but it would help eliminate Xen from the 
problem.
- Change the watchdog timer to a much shorter interval - maybe 1/10 of second 
or something like that. This won't eliminate the underlying problem but will 
make the delays a lot shorter and easier to overlook. If this appears to solve 
the problem, it's kind of a smoking gun that our interrupts are disappearing.  

Let me know how it goes.

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to