You may want to post this on intel-wired-...@osuosl.org, but I think if you have a Bugzilla open we should be aware of this already.
Todd Fujinaka Software Application Engineer Data Center Group Intel Corporation todd.fujin...@intel.com -----Original Message----- From: Ivan Pazos Atanes <ipazo...@redhat.com> Sent: Tuesday, July 27, 2021 3:52 AM To: e1000-devel@lists.sourceforge.net Subject: [E1000-devel] Nics goes down when OOM Hi all, My name is Iván, I am an Openshift consultant working with a customer that is facing the following issue. When a pod starts to OOM network interfaces start to go down until the node becomes to 'Not Ready' State This is dmesg message: sh-4.4# modinfo i40e filename: /lib/modules/4.18.0-193.51.1.el8_2.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko.xz version: 2.8.20-k license: GPL v2 description: Intel(R) Ethernet Connection XL710 Network Driver author: Intel Corporation, <e1000-devel@lists.sourceforge.net> rhelversion: 8.2 [Tue Jul 27 09:36:43 2021] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd28f9b32_f407_44bd_a64d_cf05a10f2a5f.slice/crio-6ffbabbb06eb557c53304b0b253122a82ac4ea5d31535503f812a97dff9ac4c.scope: cache:0KB rss:132KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:904KB inactive_file:0KB ative_file:0KB unevictable:0KB [Tue Jul 27 09:36:43 2021] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [Tue Jul 27 09:36:43 2021] [21805] 0 21805 35869 651 176128 0 -1000 conmon [Tue Jul 27 09:36:43 2021] [21806] 0 21806 383963 5780 249856 0 -1000 runc [Tue Jul 27 09:36:43 2021] [21835] 0 21835 5029 855 65536 0 -1000 exe *[Tue Jul 27 09:36:43 2021] Out of memory and no killable processes...[Tue Jul 27 09:36:43 2021] i40e 0000:14:00.1: Query for DCB configuration failed, err I40E_ERR_NOT_READY aq_err OK[Tue Jul 27 09:36:44 2021] i40e 0000:14:00.1: DCB init failed -63, disabled* [Tue Jul 27 09:36:44 2021] bond0: (slave eno2): link status definitely down, disabling slave *[Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0: Query for DCB configuration failed, err I40E_ERR_NOT_READY aq_err OK* [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0: DCB init failed -63, disabled [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1 eno2: port 4789 already offloaded [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1 eno2: port 4789 already offloaded [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1: FW LLDP is enabled *[Tue Jul 27 09:36:45 2021] bond0: (slave eno1): link status definitely down, disabling slave*[Tue Jul 27 09:36:45 2021] i40iw_deinit_device: state = 11 [Tue Jul 27 09:36:45 2021] bond0: (slave eno2): link status definitely up, 10000 Mbps full duplex [Tue Jul 27 09:36:45 2021] ib_srpt srpt_remove_one(i40iw1): nothing to do. [Tue Jul 27 09:36:45 2021] device vethda19890a entered promiscuous mode [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0 eno1: port 4789 already offloaded [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0 eno1: port 4789 already offloaded [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0: FW LLDP is enabled [Tue Jul 27 09:36:45 2021] i40iw_deinit_device: state = 11 [Tue Jul 27 09:36:45 2021] ib_srpt srpt_remove_one(i40iw0): nothing to do. [Tue Jul 27 09:36:45 2021] i40iw_initialize_dev: DCB is set/clear = 0 [Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1283] fm load status[x0703] [Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1288] I40E_GLPE_CPUSTATUS1 status[x0080] [Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1291] I40E_GLPE_CPUSTATUS2 status[x0080] [Tue Jul 27 09:36:45 2021] bond0: (slave eno1): link status definitely up, 10000 Mbps full duplex [Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1283] fm load status[x0703] [Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1288] I40E_GLPE_CPUSTATUS1 status[x0080] [Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1291] I40E_GLPE_CPUSTATUS2 status[x0080] [Tue Jul 27 09:36:45 2021] ib_srpt MAD registration failed for i40iw0-1. [Tue Jul 27 09:36:45 2021] ib_srpt srpt_add_one(i40iw0) failed. [Tue Jul 27 09:36:45 2021] i40iw_open: i40iw_open completed [Tue Jul 27 09:36:45 2021] ib_srpt MAD registration failed for i40iw1-1. [Tue Jul 27 09:36:45 2021] ib_srpt srpt_add_one(i40iw1) failed. [Tue Jul 27 09:36:45 2021] i40iw_open: i40iw_open completed [Tue Jul 27 09:36:49 2021] exe invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000 [Tue Jul 27 09:36:50 2021] exe cpuset=/ mems_allowed=0-3 [Tue Jul 27 09:36:50 2021] CPU: 53 PID: 21835 Comm: exe Tainted: G W L --------- - - 4.18.0-193.51.1.el8_2.x86_64 #1 [Tue Jul 27 09:36:50 2021] Hardware name: HPE ProLiant XL170r Gen10/ProLiant XL170r Gen10, BIOS U38 10/26/2020 [Tue Jul 27 09:36:50 2021] Call Trace: I was looking for information about the error codes and there is very little information on the Internet. Maybe you already know the problem. Any information would be very helpful! BR, -- *Ivan Pazos* Senior Openshift Consultant Red Hat Iberia <https://www.redhat.com/> ivan.pa...@redhat.com Mobile : +34647962071 @RedHat <https://twitter.com/redhat> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc> <https://www.redhat.com/> _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet