Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein via Bloat Mon, 16 Nov 2020 04:05:39 -0800


On 16 Nov 2020, at 12:56, Jesper Dangaard Brouer wrote:

On Fri, 13 Nov 2020 07:31:26 +0100
"Thomas Rosenstein" <[email protected]> wrote:

On 12 Nov 2020, at 16:42, Jesper Dangaard Brouer wrote:

On Thu, 12 Nov 2020 14:42:59 +0100
"Thomas Rosenstein" <[email protected]> wrote:

Notice "Adaptive" setting is on.  My long-shot theory(2) is that
this
adaptive algorithm in the driver code can guess wrong (due to not
taking TSO into account) and cause issues for

Try to turn this adaptive algorithm off:

  ethtool -C eth4 adaptive-rx off adaptive-tx off

[...]


rx-usecs: 32


When you run off "adaptive-rx" you will get 31250 interrupts/sec
(calc 1/(32/10^6) = 31250).

rx-frames: 64

[...]

tx-usecs-irq: 0
tx-frames-irq: 0

[...]


I have now updated the settings to:

ethtool -c eth4
Coalesce parameters for eth4:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 0


Please put a value in rx-usecs, like 20 or 10.
The value 0 is often used to signal driver to do adaptive.


Ok, put it now to 10.


Setting it to 10 is a little aggressive, as you ask it to generate
100,000 interrupts per sec.  (Watch with 'vmstat 1' to see it.)

 1/(10/10^6) = 100000 interrupts/sec

Goes a bit quicker (transfer up to 26 MB/s), but discards and pcistalls
are still there.


Why are you measuring in (26) MBytes/sec ? (equal 208 Mbit/s)


yep 208 MBits


If you still have ethtool PHY-discards, then you still have a problem.

Ping times are noticable improved:


Okay so this means these changes did have a positive effect.  So, this
can be related to OS is not getting activated fast-enough by NIC
interrupts.

64 bytes from x.x.x.x: icmp_seq=39 ttl=64 time=0.172 ms
64 bytes from x.x.x.x: icmp_seq=40 ttl=64 time=0.414 ms
64 bytes from x.x.x.x: icmp_seq=41 ttl=64 time=0.183 ms
64 bytes from x.x.x.x: icmp_seq=42 ttl=64 time=1.41 ms
64 bytes from x.x.x.x: icmp_seq=43 ttl=64 time=0.172 ms
64 bytes from x.x.x.x: icmp_seq=44 ttl=64 time=0.228 ms
64 bytes from x.x.x.x: icmp_seq=46 ttl=64 time=0.120 ms
64 bytes from x.x.x.x: icmp_seq=47 ttl=64 time=1.47 ms
64 bytes from x.x.x.x: icmp_seq=48 ttl=64 time=0.162 ms
64 bytes from x.x.x.x: icmp_seq=49 ttl=64 time=0.160 ms
64 bytes from x.x.x.x: icmp_seq=50 ttl=64 time=0.158 ms
64 bytes from x.x.x.x: icmp_seq=51 ttl=64 time=0.113 ms


Can you try to test if disabling TSO, GRO and GSO makes a difference?

 ethtool -K eth4 gso off gro off tso off

I had a call yesterday with Mellanox and we added the following bootoptions: intel_idle.max_cstate=0 processor.max_cstate=1 idle=poll

This completely solved the problem, but now we run with a heater andenergy consumer, nearly 2x Watts on the outlet.

I had no discards, super pings during transfer(< 0.100 ms), no outliers,and good transfer rates > 50 MB/s

So it seems to be related to C-State management in newer kernel versionbeing too agressive.I would like to try to tune here a bit, maybe we can get some inputwhich knobs to turn?

I will read here:https://www.kernel.org/doc/html/latest/admin-guide/pm/cpuidle.html#idle-states-representation

and related docs, I think there will be a few helpful hints.


--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Reply via email to