On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:

On Sat, 07 Nov 2020 14:00:04 +0100
Thomas Rosenstein via Bloat <[email protected]> wrote:

Here's an extract from the ethtool https://pastebin.com/cabpWGFz just in
case there's something hidden.

Yes, there is something hiding in the data from ethtool_stats.pl[1]:
(10G Mellanox Connect-X cards via 10G SPF+ DAC)

stat: 1 ( 1) <= outbound_pci_stalled_wr_events /sec
 stat:    339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec

I've not seen this counter 'rx_buffer_passed_thres_phy' before, looking
in the kernel driver code it is related to "rx_buffer_almost_full".
The numbers per second is excessive (but it be related to a driver bug
as it ends up reading "high" -> rx_buffer_almost_full_high in the
extended counters).

 stat:     29583661 ( 29,583,661) <= rx_bytes /sec
 stat:     30343677 ( 30,343,677) <= rx_bytes_phy /sec

You are receiving with 236 Mbit/s in 10Gbit/s link.  There is a
difference between what the OS sees (rx_bytes) and what the NIC
hardware sees (rx_bytes_phy) (diff approx 6Mbit/s).

 stat:        19552 (     19,552) <= rx_packets /sec
 stat:        19950 (     19,950) <= rx_packets_phy /sec

Could these packets be from VLAN interfaces that are not used in the OS?


Above RX packet counters also indicated HW is seeing more packets that
OS is receiving.

Next counters is likely your problem:

 stat:          718 (        718) <= tx_global_pause /sec
 stat:       954035 (    954,035) <= tx_global_pause_duration /sec
 stat:          714 (        714) <= tx_pause_ctrl_phy /sec

As far as I can see that's only the TX, and we are only doing RX on this interface - so maybe that's irrelevant?


It looks like you have enabled Ethernet Flow-Control, and something is
causing pause frames to be generated. It seem strange that this happen
on a 10Gbit/s link with only 236 Mbit/s.

The TX byte counters are also very strange:

 stat:        26063 (     26,063) <= tx_bytes /sec
 stat:        71950 (     71,950) <= tx_bytes_phy /sec

Also, it's TX, and we are only doing RX, as I said already somewhere, it's async routing, so the TX data comes via another router back.


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

Strange size distribution:
 stat:     19922 (     19,922) <= rx_1519_to_2047_bytes_phy /sec
 stat:        14 (         14) <= rx_65_to_127_bytes_phy /sec
_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to