Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein via Bloat Thu, 12 Nov 2020 03:27:09 -0800


On 12 Nov 2020, at 11:05, Jesper Dangaard Brouer wrote:

On Mon, 09 Nov 2020 15:33:46 +0100
"Thomas Rosenstein" <[email protected]> wrote:
On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote:
On Mon, 09 Nov 2020 11:09:33 +0100
"Thomas Rosenstein" <[email protected]> wrote:
On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:
On Sat, 07 Nov 2020 14:00:04 +0100
Thomas Rosenstein via Bloat <[email protected]> wrote:
Here's an extract from the ethtool https://pastebin.com/cabpWGFz
just
in
case there's something hidden.
Yes, there is something hiding in the data fromethtool_stats.pl[1]:
(10G Mellanox Connect-X cards via 10G SPF+ DAC)
stat: 1 ( 1) <=outbound_pci_stalled_wr_events /secstat: 339731557 (339,731,557) <= rx_buffer_passed_thres_phy/sec
I've not seen this counter 'rx_buffer_passed_thres_phy' before,
looking
in the kernel driver code it is related to"rx_buffer_almost_full".
The numbers per second is excessive (but it be related to a driver
bug
as it ends up reading "high" -> rx_buffer_almost_full_high in the
extended counters).
I have now tested with a new kernel 5.9.4 build made from 3.10 withmake
oldconfig and I noticed an interesting effect.

The first ca. 2 minutes the router behaves completely normal as with
3.10, after that the ping times go crazy.

I have recorded this with ethtool, and also the ping times.

Ethtool: (13 MB)
https://drive.google.com/file/d/1Ojp64UUw0zKwrgF_CisZb3BCdidAJYZo/view?usp=sharing

The transfer first was doing around 50 - 70 MB/s then once the ping
times go worse it dropped to ~12 MB/s.
ca. Line 74324 the transfer speed drops to 12 MB/s
Lets add them here so we can see and talk about them:
74324 Ethtool(eth4 ) stat: 144584002 ( 144,584,002) <=rx_buffer_passed_thres_phy /sec74325 Ethtool(eth4 ) stat: 17243186 ( 17,243,186) <= rx_bytes/sec74326 Ethtool(eth4 ) stat: 19032079 ( 19,032,079) <=rx_bytes_phy /sec
Significant difference between RX-bytes Phy-level and OS-level.

17243186*8/10^6 = 138 Mbit/s
74327 Ethtool(eth4 ) stat: 321 ( 321) <=rx_cache_busy /sec74328 Ethtool(eth4 ) stat: 321 ( 321) <=rx_cache_full /sec74329 Ethtool(eth4 ) stat: 7323 ( 7,323) <=rx_cache_reuse /sec74330 Ethtool(eth4 ) stat: 15188 ( 15,188) <=rx_csum_complete /sec74331 Ethtool(eth4 ) stat: 1 ( 1) <=rx_csum_none /sec74332 Ethtool(eth4 ) stat: 96 ( 96) <=rx_csum_unnecessary /sec74333 Ethtool(eth4 ) stat: 1242 ( 1,242) <=rx_discards_phy /sec
Packets are getting rx_discards
74334 Ethtool(eth4 ) stat: 15285 ( 15,285) <=rx_packets /sec74335 Ethtool(eth4 ) stat: 16655 ( 16,655) <=rx_packets_phy /sec
(16655-15285 = 1370)
Again difference between RX-packets Phy-level and OS-level
74336 Ethtool(eth4 ) stat: 19132497 ( 19,132,497) <=rx_prio0_bytes /sec74337 Ethtool(eth4 ) stat: 1264 ( 1,264) <=rx_prio0_discards /sec74338 Ethtool(eth4 ) stat: 15474 ( 15,474) <=rx_prio0_packets /sec

I have also updated now OFED on the routers and checked with 3.10 and5.4.60 (new firmware for the adapters)

That did change absolutly nothing!

I noticed still that only under kernel 5.4.60 the discard happen, on3.10 they don't.


Based on that I would blame the kernel instead of hardware.
Do you agree on this?

I wouldn't exclude any settings to be problematic for now, maybe somedefaults changed.

Seems you are right about the rx_buffer_passed_thres_phy if you check
just those lines they appear more often once the speed dropped.
Not sure if that's the cause or an effect of the underlying problem!


Thanks for the link to Mellanox doc:
 https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters

The counter: 'rx_buffer_passed_thres_phy'
 Description:

'The number of events where the port receive buffer was over 85%full.

   Supported from kernel 4.14'

The counter: 'tx_pause_ctrl_phy'
 Description:
 'The number of link layer pause packets transmitted on a physical
  port. If this counter is increasing, it implies that the NIC is
  congested and cannot absorb the traffic coming from the network.'


So, the NIC is clearly telling us that something is wrong. You are not
near link speed (12MB/s => 96Mbit/s).   And still port receive buffer

is over 85% full most of the time. There are indications that, theNIC

is congested and cannot absorb the traffic coming from the network.

I have opened a support case with Mellanox maybe they have an ideadirectly.

Their first response based on the outbound_pci_stalled_wr was that thePCIexpress credits for the card run out.Which means (if I interpret it correctly) that the buffer is not emptiedquickly enough anymore.


Long-shot theory(1):  Could this be caused by CPUs going into too deep
sleep states? And NIC not sending wake-up signal early enough, e.g.
because it holds on to packets for generating a bulk at IRQ raise.

Long-shot theory(2): The NIC driver IRQ interrupt coalesce
adaptive-IRQ (adj via ethtool -C) is wrong. The reason it is wrong is
because of TSO frames, due packet-to-byte ratio isn't taken into
account.


here are the coalese parameters: (are the same for 3.10 and 5.4.60)

ethtool -c eth4
Coalesce parameters for eth4:
Adaptive RX: on  TX: on
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 32
rx-frames: 64
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 8
tx-frames: 128
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

You can test theory(2) in two ways, via disabling TSO and viaadjusting

ethtool --show-coalesce|--coalesce (-c|-C).  The --coalesce change can
change part of theory(1), the NIC wake-up/IRQ interval.  For CPU sleep
modes see software package 'tuned'.

CPU mode is set to performance, I thought that would stop any deep sleepstates.

Pings:
https://drive.google.com/file/d/16phOxM5IFU6RAl4Ua4pRqMNuLYBc4RK7/view?usp=sharing

Pause frames were activated again after the restart.

(Here a link for rerefence for the ethtool variables:
https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)
To easier digest, here are some of the interesting counters.
74171 Ethtool(eth4 ) stat: 24 ( 24) <=outbound_pci_stalled_wr /sec74172 Ethtool(eth4 ) stat: 1 ( 1) <=outbound_pci_stalled_wr_events /sec
  [...]
74628 Ethtool(eth5 ) stat: 40 ( 40) <=outbound_pci_stalled_wr /sec74629 Ethtool(eth5 ) stat: 1 ( 1) <=outbound_pci_stalled_wr_events /sec
74539 Ethtool(eth4 ) stat: 432 ( 432) <=tx_global_pause /sec74540 Ethtool(eth4 ) stat: 551264 ( 551,264) <=tx_global_pause_duration /sec74541 Ethtool(eth4 ) stat: 434 ( 434) <=tx_mac_control_phy /sec74542 Ethtool(eth4 ) stat: 31 ( 31) <=tx_nop /sec74543 Ethtool(eth4 ) stat: 24888 ( 24,888) <=tx_packets /sec74544 Ethtool(eth4 ) stat: 25187 ( 25,187) <=tx_packets_phy /sec74545 Ethtool(eth4 ) stat: 434 ( 434) <=tx_pause_ctrl_phy /sec74546 Ethtool(eth4 ) stat: 15190865 ( 15,190,865) <=tx_prio0_bytes /sec74547 Ethtool(eth4 ) stat: 24907 ( 24,907) <=tx_prio0_packets /sec74548 Ethtool(eth4 ) stat: 647795 ( 647,795) <=tx_tso_bytes /sec74549 Ethtool(eth4 ) stat: 196 ( 196) <=tx_tso_packets /sec74550 Ethtool(eth4 ) stat: 15005949 ( 15,005,949) <=tx_vport_unicast_bytes /sec74551 Ethtool(eth4 ) stat: 24780 ( 24,780) <=tx_vport_unicast_packets /sec74552 Ethtool(eth4 ) stat: 57 ( 57) <=tx_xmit_more /sec
  [...]
75030 Ethtool(eth5 ) stat: 39 ( 39) <=tx_global_pause /sec75031 Ethtool(eth5 ) stat: 242374 ( 242,374) <=tx_global_pause_duration /sec75032 Ethtool(eth5 ) stat: 39 ( 39) <=tx_mac_control_phy /sec75033 Ethtool(eth5 ) stat: 166 ( 166) <=tx_nop /sec75034 Ethtool(eth5 ) stat: 14837 ( 14,837) <=tx_packets /sec75035 Ethtool(eth5 ) stat: 14816 ( 14,816) <=tx_packets_phy /sec75036 Ethtool(eth5 ) stat: 39 ( 39) <=tx_pause_ctrl_phy /sec75037 Ethtool(eth5 ) stat: 16748584 ( 16,748,584) <=tx_prio0_bytes /sec75038 Ethtool(eth5 ) stat: 14708 ( 14,708) <=tx_prio0_packets /sec75039 Ethtool(eth5 ) stat: 2477442 ( 2,477,442) <=tx_tso_bytes /sec75040 Ethtool(eth5 ) stat: 733 ( 733) <=tx_tso_packets /sec75041 Ethtool(eth5 ) stat: 16765330 ( 16,765,330) <=tx_vport_unicast_bytes /sec75042 Ethtool(eth5 ) stat: 14780 ( 14,780) <=tx_vport_unicast_packets /sec75043 Ethtool(eth5 ) stat: 2 ( 2) <=tx_xmit_more /sec
--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Is it fine to always reply all here, or should I just keep the list?

_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Reply via email to