Hi Assaf, Thank you for the data. I see from the data files you included that you are working with a Cisco-branded E810-CQDA2 NIC.
As this is a Cisco supported NIC, have you consulted Cisco support and configured your system with Cisco-approved firmware/vendor versions? I do not support the Cisco products, but I see immediately that the NIC FW is revision 2.25. The ice driver v1.9.11 was developed at Intel for use with 4.xx firmware. Please contact Cisco. If it is a problem that they cannot resolve the matter, they will reach out to the appropriate Intel support team for this product. Best regards, - Don From: Assaf Albo <ass...@qwilt.com> Sent: Wednesday, December 6, 2023 3:34 AM To: Buchholz, Donald <donald.buchh...@intel.com> Cc: Brandeburg, Jesse <jesse.brandeb...@intel.com>; e1000-devel@lists.sourceforge.net; Matan Levy <mat...@qwilt.com>; Itamar Maron <itam...@qwilt.com> Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically Hey guys, Firstly, I'd like to thank you all for helping us out. Attached to this mail are two files with all the statistics (client machine + server machine). "The passthrough device shouldn't be any problem but I do recommend that if you're passing through the device to a VM, you try to match the destination PCIe function number to the origination ID to prevent odd issues. like if your host device is: 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is 00:06.1, and not 00:06.0" Exactly what we are doing, we are matching. You can see in the attached files that one of the machines is working with eth0 00:06.0 and the other eth1 00:06.1 "Also, do you see any stats or events on the switch side when link is lost?" We use Cisco Nexus switches, and our network engineer said that he sees events of link down from the ports. On Wed, Dec 6, 2023 at 6:42 AM Buchholz, Donald <donald.buchh...@intel.com<mailto:donald.buchh...@intel.com>> wrote: Hi Assaf, In addition to the commands listed by Jesse, please also provide "ethtool -i <eth#>" output. This will assist us in identifying the NIC and Firmware revision you are using. - Don > -----Original Message----- > From: Jesse Brandeburg > <jesse.brandeb...@intel.com<mailto:jesse.brandeb...@intel.com>> > Sent: Tuesday, December 5, 2023 10:47 AM > To: Assaf Albo <ass...@qwilt.com<mailto:ass...@qwilt.com>>; > e1000-devel@lists.sourceforge.net<mailto:e1000-devel@lists.sourceforge.net>; > Matan > Levy <mat...@qwilt.com<mailto:mat...@qwilt.com>> > Subject: Re: [e1000-devel] Intel E810 100Gb goes down sporadically > > On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote: > > Hello guys, > > > > We are having constant network issues in production in that the link goes > > down, waits *exactly* 7-8 seconds, and goes up again. > > This can happen zero to a few times a day on all our servers; they are not > > in the same location and are connected to different network devices. > > > > Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi > > (Huge pages) - overall performance is excellent. > > The NIC is PCI passed through to the KVM machine AS IS. > > OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice > > 1.9.11 built and installed using rpm. > > We have a traffic generator between two servers (our app: client+server) > > that is reaching 94Gb and can replicate this issue. > > > > The dmesg once the issue occur: > > Nov 28 16:01:27 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is Down > > Nov 28 16:01:35 SERVER kernel: ice 0000:00:06.0 eth0: NIC Link is up 100 > > Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg > > Advertised: Off, Autoneg Negotiated: False, Flow Control: None > > Hi Assaf, sorry hear you're having problems. > > w.r.t. the link down events we need to determine if it is a local down > or remote. > > Please gather the 'ethtool -S eth0' statistics for a system that has had > some problems, and send to the list as text. > > also, 'ethtool -m eth0' > > The passthrough device shouldn't be any problem but I do recommend that > if you're passing through the device to a VM, you try to match the > destination PCIe function number to the origination ID to prevent odd > issues. > > like if your host device is: > 01:00.1 then (I'm not sure you can do this) I'd hope the VM device is > 00:06.1, and not 00:06.0 > > So I guess with that statement I'd ask do you ever see the problem on > systems with > 3b:00.0 (ice PF PCIe in host) > 00:06.0 (ice PF in VM) > > having the link down issues? > > Please include output from devlink dev info, and if you know it, what > switch you're connected to. > > Also, do you see any stats or events on the switch side when link is lost? > > - Jesse > > > _______________________________________________ > E1000-devel mailing list > E1000-devel@lists.sourceforge.net<mailto:E1000-devel@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel Ethernet, visit > https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products