This is just about the last part of your post, about the 4.9 kernel and CentOS.
Are you using the stable 4.9 kernel or are you hoping patches get pulled into the CentOS 4.9 kernel? If it's the latter, you need to file a bug with Red Hat to have the patches pulled into RHEL, and then CentOS should get those changes as well. We have no direct control on the RHEL/CentOS kernels. If it's the former, someone (most likely you, since you're the one who needs the patches) has to identify the patches that should be pulled into the stable 4.9 kernel and email the maintainer of the stable kernels. I never said Intel is not monitoring the communities. I said the networking group is not monitoring the communities. At the very least, I am not monitoring the communities at all and only look when someone points things out to me. Also, if you're running HP hardware, you may want to file a bug with HP as the firmware updates have to come from HP and this may be a firmware issue. Todd Fujinaka Software Application Engineer Datacenter Engineering Group Intel Corporation todd.fujin...@intel.com -----Original Message----- From: Pavlos Parissis [mailto:pavlos.paris...@gmail.com] Sent: Wednesday, October 25, 2017 2:45 PM To: e1000-devel@lists.sourceforge.net Subject: [E1000-devel] Instability of i40e driver on 4.9 kernel Hi all, I mailed to netdev and inter-wired-lan about stability issues with i40e driver on 4.9 kernels and Todd Fujinaka suggested to mail this ML instead about our issue. We have been running 4.9 kernels for several months on CentOS 7.3 and for few weeks on CentOS 7.4, and after we replaced 10GbE copper cards(X540-AT2 with ixgbe driver) with X710 10GbE SFP cards using i40e driver, we noticed sever instabilities on our servers. On several servers the links were marked down and up again, without any obvious reasons expect a lot of errors on kernel.log: [..snip..] 2017-10-04T15:50:46.839998+02:00kernel: i40e 0000:04:00.1 eth0: tx_timeout recovery level 3, hung_queue 11 2017-10-04T15:50:50.119447+02:00kernel: i40e 0000:04:00.0: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM 2017-10-04T15:50:50.119455+02:00kernel: i40e 0000:04:00.0: DCB init failed -53, disabled 2017-10-04T15:50:50.301798+02:00kernel: i40e 0000:04:00.0 eth1: NIC Link is Down 2017-10-04T15:50:50.423744+02:00kernel: i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM 2017-10-04T15:50:50.423752+02:00kernel: i40e 0000:04:00.1: DCB init failed -53, disabled 2017-10-04T15:50:50.600812+02:00kernel: i40e 0000:04:00.1 eth0: NIC Link is Down 2017-10-04T15:50:50.764799+02:00kernel: i40e 0000:04:00.1 eth0: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None 2017-10-04T15:50:53.234804+02:00kernel: i40e 0000:04:00.0 eth1: NIC Link is Up 10 Gbps Full Duplex, Flow Control: None 2017-10-04T15:51:17.201808+02:00kernel: i40e 0000:04:00.1: TX driver issue detected, PF reset issued [..snip..] We run Bird Internet daemon on our servers in order to establish BGP peerings with routers and we have also observed flapping on BGP peerings. At the same time we had BGP peering stabilities issues we had kernel errors: 2017-10-06T07:36:10.526657+02:00 kernel: [60720.957855] i40e 0000:04:00.1: DCB init failed -53, disabled 2017-10-06T07:36:12.127091+02:00 kernel: [60722.553258] i40e 0000:04:00.1: TX driver issue detected, PF reset issued 2017-10-06T07:36:12.509188+02:00 kernel: [60722.891523] i40e 0000:04:00.1: Query for DCB configuration failed, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_EPERM We decided to go back to 3.10 kernel from CentOS, but that process wasn't smooth as latest firmware gave us problems with speed detection. We rolled back to two version old and speed detection issue was resolved. We have been running 3.10 several weeks without any problems. Even we want certain functionality from kernel 4.9, we decided to switch back to 3.10 as stability of our systems has higher priority. I need to mention that in all occurrences of the issue we didn't see any anomalies, such DDOS attacks and etc. I have opened https://communities.intel.com/message/501682#501682 and there you can find all the error messages and other information. Todd Fujinaka asked me to provide reproduction steps, but we only got the issues when we had real customer traffic on our servers. Has anyone seen those errors and observed this kind of instability? Since we noticed the issues, I have been following netdev ML and I know that there are a lot of improvements/patched queued up for 4.14 and I am hoping those patches fix our issue and most importantly are sent to linux-stable for inclusion in 4.9 kernel. Cheers, Pavlos ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired