On 11/16/23 17:24, Christian Rohmann wrote:
Dear sir or madam,

we run multiple Intel E810-CQDA2 100G adapters (2x QSFP28) in our fleet of servers . The machines are running Ubuntu 22.04 LTS (Jammy), wieth Linux kernel 6.2.0-36-generic (Ubuntu HWE Kernel).

This is the output from ethtool:

---cut ---
# ethtool -i eth2
driver: ice
version: 6.2.0-36-generic
firmware-version: 4.30 0x8001af29 1.3429.0
expansion-rom-version:
bus-info: 0000:a1:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

--- cut ---

We observe strange, totally unrealistic traffic spikes (Multiple Terabits/s) in our monitoring. We use the Prometheus Node Exporter and the netdev collector (https://github.com/prometheus/node_exporter/blob/ed1b8e3d88851806627e4f8262ee26232ca56c2c/collector/netdev_common.go#L39). I found issue https://github.com/prometheus/node_exporter/issues/1849 and it appears that others have noticed similar issues with the counters.

I have now dumped "/proc/net/dev" of one of the machines once per second to a logfile per interface to show the issue actually originates from the "ice" kernel driver
and not from any of our other tooling.

Good move!


I can provide the whole files, but if you just look at two timestamps in particular, you can actually see two jump in the counters:

--- cut ---
Inter-|   Receive |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
[...]
Nov 16 14:44:17   eth2: 322480275246795 161202637791 12245 2396226 0 0          0  71204126 497958797609464 188500340907    0    0 0 0       0          0 Nov 16 14:44:18   eth2: 386617853382565 193953665830 12245 2396226 0 0          0  71204282 593586606935949 223802656120    0    0 0 0       0          0
[...]
Nov 16 14:49:10   eth2: 386662845936810 193977501895 12247 2396226 0 0          0  71230993 593637495306092 223827197609    0    0 0 0       0          0 Nov 16 14:49:11   eth2: 450845520538932 226752438356 12247 2396226 0 0          0  71230993 689316465134429 259154140003    0    0 0 0       0          0
[...]
--- cut ---


If you require any more information to narrow down the issue, please don't hesitate to contact me.

Was there anything logged in dmesg or other system logs at that time?




Regards


Christian Rohmann



Thank you for the report, I will take a look.

We have already received similar report from Nebojsa Stevanovic, CCed.

Sorry that the issue is not resolved yet. I will review what we have
changed in the drivers between 6.1 and 6.2, where bug was introduced.

Best regards,
Przemek Kitszel

_______________________________________________
Intel-wired-lan mailing list
[email protected]
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

_______________________________________________
Intel-wired-lan mailing list
[email protected]
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Reply via email to