On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote:

"Thomas Rosenstein" <[email protected]> writes:

If so, this sounds more like a driver issue, or maybe something to do
with scheduling. Does it only happen with ICMP? You could try this
tool
for a userspace UDP measurement:

It happens with all packets, therefore the transfer to backblaze with 40
threads goes down to ~8MB/s instead of >60MB/s

Huh, right, definitely sounds like a kernel bug; or maybe the new kernel
is getting the hardware into a state where it bugs out when there are
lots of flows or something.

You could try looking at the ethtool stats (ethtool -S) while running
the test and see if any error counters go up. Here's a handy script to
monitor changes in the counters:

https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

I'll try what that reports!

Also, what happens if you ping a host on the internet (*through* the
router instead of *to* it)?

Same issue, but twice pronounced, as it seems all interfaces are
affected.
So, ping on one interface and the second has the issue.
Also all traffic across the host has the issue, but on both sides, so
ping to the internet increased by 2x

Right, so even an unloaded interface suffers? But this is the same NIC,
right? So it could still be a hardware issue...

Yep default that CentOS ships, I just tested 4.12.5 there the issue also does not happen. So I guess I can bisect it then...(really don't want to
😃)

Well that at least narrows it down :)

I just tested 5.9.4 seems to also fix it partly, I have long stretches where it looks good, and then some increases again. (3.10 Stock has them too, but not so high, rather 1-3 ms)

for example:

64 bytes from x.x.x.x: icmp_seq=10 ttl=64 time=0.169 ms
64 bytes from x.x.x.x: icmp_seq=11 ttl=64 time=5.53 ms
64 bytes from x.x.x.x: icmp_seq=12 ttl=64 time=9.44 ms
64 bytes from x.x.x.x: icmp_seq=13 ttl=64 time=0.167 ms
64 bytes from x.x.x.x: icmp_seq=14 ttl=64 time=3.88 ms

and then again:

64 bytes from x.x.x.x: icmp_seq=15 ttl=64 time=0.569 ms
64 bytes from x.x.x.x: icmp_seq=16 ttl=64 time=0.148 ms
64 bytes from x.x.x.x: icmp_seq=17 ttl=64 time=0.286 ms
64 bytes from x.x.x.x: icmp_seq=18 ttl=64 time=0.257 ms
64 bytes from x.x.x.x: icmp_seq=19 ttl=64 time=0.220 ms
64 bytes from x.x.x.x: icmp_seq=20 ttl=64 time=0.125 ms
64 bytes from x.x.x.x: icmp_seq=21 ttl=64 time=0.188 ms
64 bytes from x.x.x.x: icmp_seq=22 ttl=64 time=0.202 ms
64 bytes from x.x.x.x: icmp_seq=23 ttl=64 time=0.195 ms
64 bytes from x.x.x.x: icmp_seq=24 ttl=64 time=0.177 ms
64 bytes from x.x.x.x: icmp_seq=25 ttl=64 time=0.242 ms
64 bytes from x.x.x.x: icmp_seq=26 ttl=64 time=0.339 ms
64 bytes from x.x.x.x: icmp_seq=27 ttl=64 time=0.183 ms
64 bytes from x.x.x.x: icmp_seq=28 ttl=64 time=0.221 ms
64 bytes from x.x.x.x: icmp_seq=29 ttl=64 time=0.317 ms
64 bytes from x.x.x.x: icmp_seq=30 ttl=64 time=0.210 ms
64 bytes from x.x.x.x: icmp_seq=31 ttl=64 time=0.242 ms
64 bytes from x.x.x.x: icmp_seq=32 ttl=64 time=0.127 ms
64 bytes from x.x.x.x: icmp_seq=33 ttl=64 time=0.217 ms
64 bytes from x.x.x.x: icmp_seq=34 ttl=64 time=0.184 ms


For me it looks now that there was some fix between 5.4.60 and 5.9.4 ... anyone can pinpoint it?



How did you configure the new kernel? Did you start from scratch, or
is
it based on the old centos config?

first oldconfig and from there then added additional options for IB,
NVMe, etc (which I don't really need on the routers)

OK, so you're probably building with roughly the same options in terms
of scheduling granularity etc. That's good. Did you enable spectre
mitigations etc on the new kernel? What's the output of
`tail /sys/devices/system/cpu/vulnerabilities/*` ?

mitigations are off

==> /sys/devices/system/cpu/vulnerabilities/itlb_multihit <==
KVM: Vulnerable

==> /sys/devices/system/cpu/vulnerabilities/l1tf <==
Mitigation: PTE Inversion; VMX: vulnerable

==> /sys/devices/system/cpu/vulnerabilities/mds <==
Vulnerable; SMT vulnerable

==> /sys/devices/system/cpu/vulnerabilities/meltdown <==
Vulnerable

==> /sys/devices/system/cpu/vulnerabilities/spec_store_bypass <==
Vulnerable

==> /sys/devices/system/cpu/vulnerabilities/spectre_v1 <==
Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers

==> /sys/devices/system/cpu/vulnerabilities/spectre_v2 <==
Vulnerable, STIBP: disabled

==> /sys/devices/system/cpu/vulnerabilities/srbds <==
Not affected

==> /sys/devices/system/cpu/vulnerabilities/tsx_async_abort <==
Not affected


Grub Boot options are: crashkernel=896M rd.lvm.lv=cl/root net.ifnames=0 biosdevname=0 scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=y mitigations=off console=tty0 console=ttyS1,115200


-Toke
_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to