Thank you very much. I'll apply the patch and run the test again over the weekend.
I found the other thread, too. Mentioning that the issue can be made more likely with link flapping. But I did not believe that this is causing the trouble in our case. Ethtool still claims that the link is up. With flapping the link I only could see a state where the link stays down although I tried to set it up again with "ip link set up". For the future, we will backport a newer version. For now, we'll try to patch the issue with the smallest changes as possible. Thanks again for you help, Martin -----Original Message----- From: Skidmore, Donald C [mailto:[email protected]] Sent: Donnerstag, 28. Juli 2011 19:56 To: Zielinski, Martin; Kirsher, Jeffrey T Cc: [email protected] Subject: RE: [E1000-devel] ixgbe: not accepting any packets - increasing rx_missed_errors Hey Martin, Sorry about you troubles with ixgbe, I think there is a fair chance you're seeing the issue you referenced in the other email. The patch I'm referring to is a7f5a5fcd9f13afd3471a0de8c1fdaa8f989497c. When we saw this issue it was primarily on short DA cables (1m - 3m) and had to do with a FW/SW semaphore collision at link time. We were later able to recreate the failure on longer DA cables and fiber but it happened at a much slower rate. I'll try to address your questions below: >> I am aware that this is an old driver version, but please give me a >chance to explain why I'm asking for information anyway: >> >> - The driver is part of the 2.6.32 stable branch. We try to back port to stable branches for security issue or very critical failures. But other than that we don't activity push patches to these branches. This is possible as our current out of tree driver (on source forge) works with a wide range of kernels back to 2.4.x time frame. As well various distro back port our upstream patches as they apply to their older kernels. This seems to cover most people's needs, although it sounds like it might not work for you. >> - It takes 2 - 10 days to reproduce it in the lab. So if we use a >newer version, we cannot be sure that the problem is fixed just because >we don't see it anymore. The problem I mentioned above was like that. Very difficult to recreate even with special tests scripts we narrowed down to do it (tight loops on bring link up and down, verifying each step). It at times took as long as 53 hours. After this fix we ran several machines for over a week with no failures. Likewise I've had other people in the community hit this failure and upgrading to a new driver seemed to fix their problem. I might be able to dig up the Perl script I wrote to speed up the failure if you're interested. >> - According to the customer the issue started with an update that adds >the memory boundary and disables packet split (errata #45). PSRTYPE >register is not initialized in this version. Everything in the previous >version worked (so with the even older driver). This shouldn't be related to the patch I referenced above but we haven't tested specifically for it as the fix was never back ported to this branch. But the fix you're talking about for (erratum #45) went into the all the stables as well as net-next. We are continually validating on next-next and haven't seen this failure there. >> - It is a critical customer. If we provide a new version and it fails >again this will become a problem. I can understand that. But we only test longterm kernel drivers as we add patches to them, which I mentioned above aren't all that often. More focus from out of tree driver making sure it plays well with older kernels. It just comes down to using our limited resources where we get the most gain. >> - All reports about this issue end up without resolution or the advice >to update the driver. I really tried to extract an explanation or the >exact changeset that fixes the issue. But I failed. So for documentation >purposes it would be a good thing to make the solution googleble. We advised to update the driver as that is where the fix was put in. If you want to see a list of the patches git would be a good place to start. Everything is there although it might be a bit overwhelming as there are probably around 1000 patches. Hope this helps, -Don Skidmore <[email protected]> [... snip removed the older mails ...] Firmensitz: Muenchen Amtsgericht: AG Muenchen Handelsregister: HRB 144340 Geschaeftsfuehrer: Emmet Russell, Keith Krzeminski, Douglas Rice Bankverbindung: ABN-Amro Bank N.V. Konto 671 211 9006 UST-ID: DE168122444 ------------------------------------------------------------------------------ Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
