Hi, We have a fleet of Dell PowerEdge R640 all with very similar configuration, important piece here is they run intel 10GB ethernet cards as below:
lspci | grep Ether 19:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01) 19:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10G X550T (rev 01) 1a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 1a:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) Only 2 of them failing to auto-negotiate correct link speed: ethtool eno1 Settings for eno1: Supported ports: [ TP ] Supported link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Advertised link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: yes ethtool eno2 Settings for eno2: Supported ports: [ TP ] Supported link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Advertised link modes: 100baseT/Full 1000baseT/Full 10000baseT/Full Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on MDI-X: Unknown Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) drv probe link Link detected: yes They sometimes will also loose connectivity entirely for extended period of up to 4 hours, here's our switch logs which usually indicates the problem lacpd[20416]: %DAEMON-5-LACPD_TIMEOUT: xe-10/0/16: lacp current while timer expired current Receive State: CURRENT /kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-10/0/16 - ATTACHED state - acting as standby link rpd[1866]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags 0xc000 mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags 0xc000 mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-10/0/16.0 has got color 0 lacpd[20416]: %DAEMON-5-LACPD_TIMEOUT: xe-11/0/16: lacp current while timer expired current Receive State: CURRENT /kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-11/0/16 - ATTACHED state - acting as standby link lacpd[20416]: %DAEMON-5-LACP_INTF_DOWN: ae125: Interface marked down due to lacp timeout on member xe-11/0/16 rpd[1866]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags 0xc000 mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags 0xc000 mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-11/0/16.0 has got color 0 /kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-11/0/16 - CD state - ready to carry traffic /kernel: %KERN-5-KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-10/0/16 - CD state - ready to carry traffic rpd[1866]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags 0xc000 rpd[1866]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags 0xc000 mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-11/0/16 index 2456: ifdm_flags 0xc000 mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-11/0/16.0 has got color 0 mcsnoopd[94056]: %DAEMON-6: received iff message xe-11/0/16.0 ifl 8c6fcf0 op 2 flag 0 mcsnoopd[94056]: %DAEMON-6: KRT Ifstate: Decode iff message - ifl(xe-11/0/16.0) without mesh-group tlv mcsnoopd[94056]: %DAEMON-6: Decode ifd xe-10/0/16 index 2406: ifdm_flags 0xc000 mcsnoopd[94056]: %DAEMON-6: krt_decode_iflogical: xe-10/0/16.0 has got color 0 mcsnoopd[94056]: %DAEMON-6: received iff message xe-10/0/16.0 ifl 8be35a0 op 2 flag 0 mcsnoopd[94056]: %DAEMON-6: KRT Ifstate: Decode iff message - ifl(xe-10/0/16.0) without mesh-group tlv We have upgraded to 4.14.52 kernel hoping there might be some ixgbe patch that fixes this problem but the problem still persists. I am posting here to seek advice on how to diagnose and probably fix this problem. Thanks! Abejide Ayodele It always seems impossible until it's done. --Nelson Mandela ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired