You may want to post this on intel-wired-...@osuosl.org, but I think if you 
have a Bugzilla open we should be aware of this already.

Todd Fujinaka
Software Application Engineer
Data Center Group
Intel Corporation
todd.fujin...@intel.com

-----Original Message-----
From: Ivan Pazos Atanes <ipazo...@redhat.com> 
Sent: Tuesday, July 27, 2021 3:52 AM
To: e1000-devel@lists.sourceforge.net
Subject: [E1000-devel] Nics goes down when OOM

Hi all,

My name is Iván, I am an Openshift consultant working with a customer that is 
facing the following issue. When a pod starts to OOM network interfaces start 
to go down until the node becomes to 'Not Ready' State

This is dmesg  message:

sh-4.4# modinfo i40e
filename:
/lib/modules/4.18.0-193.51.1.el8_2.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko.xz
version:        2.8.20-k
license:        GPL v2
description:    Intel(R) Ethernet Connection XL710 Network Driver
author:         Intel Corporation, <e1000-devel@lists.sourceforge.net>
rhelversion:    8.2

[Tue Jul 27 09:36:43 2021] Memory cgroup stats for
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podd28f9b32_f407_44bd_a64d_cf05a10f2a5f.slice/crio-6ffbabbb06eb557c53304b0b253122a82ac4ea5d31535503f812a97dff9ac4c.scope:
cache:0KB rss:132KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB 
writeback:0KB swap:0KB inactive_anon:0KB active_anon:904KB inactive_file:0KB 
ative_file:0KB unevictable:0KB
[Tue Jul 27 09:36:43 2021] [ pid ]   uid  tgid total_vm      rss
pgtables_bytes swapents oom_score_adj name
[Tue Jul 27 09:36:43 2021] [21805]     0 21805    35869      651   176128
     0         -1000 conmon
[Tue Jul 27 09:36:43 2021] [21806]     0 21806   383963     5780   249856
     0         -1000 runc
[Tue Jul 27 09:36:43 2021] [21835]     0 21835     5029      855    65536
     0         -1000 exe


*[Tue Jul 27 09:36:43 2021] Out of memory and no killable processes...[Tue Jul 
27 09:36:43 2021] i40e 0000:14:00.1: Query for DCB configuration failed, err 
I40E_ERR_NOT_READY aq_err OK[Tue Jul 27 09:36:44 2021] i40e
0000:14:00.1: DCB init failed -63, disabled* [Tue Jul 27 09:36:44 2021] bond0: 
(slave eno2): link status definitely down, disabling slave *[Tue Jul 27 
09:36:45 2021] i40e 0000:14:00.0: Query for DCB configuration failed, err 
I40E_ERR_NOT_READY aq_err OK* [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.0: DCB 
init failed -63, disabled [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1 eno2: 
port 4789 already offloaded [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1 eno2: 
port 4789 already offloaded [Tue Jul 27 09:36:45 2021] i40e 0000:14:00.1: FW 
LLDP is enabled

*[Tue Jul 27 09:36:45 2021] bond0: (slave eno1): link status definitely down, 
disabling slave*[Tue Jul 27 09:36:45 2021] i40iw_deinit_device: state = 11 [Tue 
Jul 27 09:36:45 2021] bond0: (slave eno2): link status definitely up,
10000 Mbps full duplex
[Tue Jul 27 09:36:45 2021] ib_srpt srpt_remove_one(i40iw1): nothing to do.
[Tue Jul 27 09:36:45 2021] device vethda19890a entered promiscuous mode [Tue 
Jul 27 09:36:45 2021] i40e 0000:14:00.0 eno1: port 4789 already offloaded [Tue 
Jul 27 09:36:45 2021] i40e 0000:14:00.0 eno1: port 4789 already offloaded [Tue 
Jul 27 09:36:45 2021] i40e 0000:14:00.0: FW LLDP is enabled [Tue Jul 27 
09:36:45 2021] i40iw_deinit_device: state = 11 [Tue Jul 27 09:36:45 2021] 
ib_srpt srpt_remove_one(i40iw0): nothing to do.
[Tue Jul 27 09:36:45 2021] i40iw_initialize_dev: DCB is set/clear = 0 [Tue Jul 
27 09:36:45 2021] i40iw_wait_pe_ready: [1283] fm load status[x0703] [Tue Jul 27 
09:36:45 2021] i40iw_wait_pe_ready: [1288] I40E_GLPE_CPUSTATUS1 status[x0080] 
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1291] I40E_GLPE_CPUSTATUS2 
status[x0080] [Tue Jul 27 09:36:45 2021] bond0: (slave eno1): link status 
definitely up,
10000 Mbps full duplex
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1283] fm load status[x0703] 
[Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1288] I40E_GLPE_CPUSTATUS1 
status[x0080] [Tue Jul 27 09:36:45 2021] i40iw_wait_pe_ready: [1291] 
I40E_GLPE_CPUSTATUS2 status[x0080] [Tue Jul 27 09:36:45 2021] ib_srpt MAD 
registration failed for i40iw0-1.
[Tue Jul 27 09:36:45 2021] ib_srpt srpt_add_one(i40iw0) failed.
[Tue Jul 27 09:36:45 2021] i40iw_open: i40iw_open completed [Tue Jul 27 
09:36:45 2021] ib_srpt MAD registration failed for i40iw1-1.
[Tue Jul 27 09:36:45 2021] ib_srpt srpt_add_one(i40iw1) failed.
[Tue Jul 27 09:36:45 2021] i40iw_open: i40iw_open completed [Tue Jul 27 
09:36:49 2021] exe invoked oom-killer:
gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-1000 
[Tue Jul 27 09:36:50 2021] exe cpuset=/ mems_allowed=0-3
[Tue Jul 27 09:36:50 2021] CPU: 53 PID: 21835 Comm: exe Tainted: G        W
   L   --------- -  - 4.18.0-193.51.1.el8_2.x86_64 #1
[Tue Jul 27 09:36:50 2021] Hardware name: HPE ProLiant XL170r Gen10/ProLiant 
XL170r Gen10, BIOS U38 10/26/2020 [Tue Jul 27 09:36:50 2021] Call Trace:

I was looking for information about the error codes and there is very little 
information on the Internet.

Maybe you already know the problem.

Any information would be very helpful!

BR,
--
*Ivan Pazos*

Senior Openshift Consultant

Red Hat Iberia <https://www.redhat.com/>

ivan.pa...@redhat.com
Mobile : +34647962071
@RedHat <https://twitter.com/redhat>   Red Hat
<https://www.linkedin.com/company/red-hat>  Red Hat 
<https://www.facebook.com/RedHatInc>
<https://www.redhat.com/>

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

Reply via email to