On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote:
"Thomas Rosenstein" <[email protected]> writes:
If so, this sounds more like a driver issue, or maybe something to
do
with scheduling. Does it only happen with ICMP? You could try this
tool
for a userspace UDP measurement:
It happens with all packets, therefore the transfer to backblaze with
40
threads goes down to ~8MB/s instead of >60MB/s
Huh, right, definitely sounds like a kernel bug; or maybe the new
kernel
is getting the hardware into a state where it bugs out when there are
lots of flows or something.
You could try looking at the ethtool stats (ethtool -S) while running
the test and see if any error counters go up. Here's a handy script to
monitor changes in the counters:
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
I'll try what that reports!
Also, what happens if you ping a host on the internet (*through* the
router instead of *to* it)?
Same issue, but twice pronounced, as it seems all interfaces are
affected.
So, ping on one interface and the second has the issue.
Also all traffic across the host has the issue, but on both sides, so
ping to the internet increased by 2x
Right, so even an unloaded interface suffers? But this is the same
NIC,
right? So it could still be a hardware issue...
Yep default that CentOS ships, I just tested 4.12.5 there the issue
also
does not happen. So I guess I can bisect it then...(really don't want
to
😃)
Well that at least narrows it down :)
I just tested 5.9.4 seems to also fix it partly, I have long stretches
where it looks good, and then some increases again. (3.10 Stock has them
too, but not so high, rather 1-3 ms)
for example:
64 bytes from x.x.x.x: icmp_seq=10 ttl=64 time=0.169 ms
64 bytes from x.x.x.x: icmp_seq=11 ttl=64 time=5.53 ms
64 bytes from x.x.x.x: icmp_seq=12 ttl=64 time=9.44 ms
64 bytes from x.x.x.x: icmp_seq=13 ttl=64 time=0.167 ms
64 bytes from x.x.x.x: icmp_seq=14 ttl=64 time=3.88 ms
and then again:
64 bytes from x.x.x.x: icmp_seq=15 ttl=64 time=0.569 ms
64 bytes from x.x.x.x: icmp_seq=16 ttl=64 time=0.148 ms
64 bytes from x.x.x.x: icmp_seq=17 ttl=64 time=0.286 ms
64 bytes from x.x.x.x: icmp_seq=18 ttl=64 time=0.257 ms
64 bytes from x.x.x.x: icmp_seq=19 ttl=64 time=0.220 ms
64 bytes from x.x.x.x: icmp_seq=20 ttl=64 time=0.125 ms
64 bytes from x.x.x.x: icmp_seq=21 ttl=64 time=0.188 ms
64 bytes from x.x.x.x: icmp_seq=22 ttl=64 time=0.202 ms
64 bytes from x.x.x.x: icmp_seq=23 ttl=64 time=0.195 ms
64 bytes from x.x.x.x: icmp_seq=24 ttl=64 time=0.177 ms
64 bytes from x.x.x.x: icmp_seq=25 ttl=64 time=0.242 ms
64 bytes from x.x.x.x: icmp_seq=26 ttl=64 time=0.339 ms
64 bytes from x.x.x.x: icmp_seq=27 ttl=64 time=0.183 ms
64 bytes from x.x.x.x: icmp_seq=28 ttl=64 time=0.221 ms
64 bytes from x.x.x.x: icmp_seq=29 ttl=64 time=0.317 ms
64 bytes from x.x.x.x: icmp_seq=30 ttl=64 time=0.210 ms
64 bytes from x.x.x.x: icmp_seq=31 ttl=64 time=0.242 ms
64 bytes from x.x.x.x: icmp_seq=32 ttl=64 time=0.127 ms
64 bytes from x.x.x.x: icmp_seq=33 ttl=64 time=0.217 ms
64 bytes from x.x.x.x: icmp_seq=34 ttl=64 time=0.184 ms
For me it looks now that there was some fix between 5.4.60 and 5.9.4 ...
anyone can pinpoint it?
How did you configure the new kernel? Did you start from scratch, or
is
it based on the old centos config?
first oldconfig and from there then added additional options for IB,
NVMe, etc (which I don't really need on the routers)
OK, so you're probably building with roughly the same options in terms
of scheduling granularity etc. That's good. Did you enable spectre
mitigations etc on the new kernel? What's the output of
`tail /sys/devices/system/cpu/vulnerabilities/*` ?
mitigations are off
==> /sys/devices/system/cpu/vulnerabilities/itlb_multihit <==
KVM: Vulnerable
==> /sys/devices/system/cpu/vulnerabilities/l1tf <==
Mitigation: PTE Inversion; VMX: vulnerable
==> /sys/devices/system/cpu/vulnerabilities/mds <==
Vulnerable; SMT vulnerable
==> /sys/devices/system/cpu/vulnerabilities/meltdown <==
Vulnerable
==> /sys/devices/system/cpu/vulnerabilities/spec_store_bypass <==
Vulnerable
==> /sys/devices/system/cpu/vulnerabilities/spectre_v1 <==
Vulnerable: __user pointer sanitization and usercopy barriers only; no
swapgs barriers
==> /sys/devices/system/cpu/vulnerabilities/spectre_v2 <==
Vulnerable, STIBP: disabled
==> /sys/devices/system/cpu/vulnerabilities/srbds <==
Not affected
==> /sys/devices/system/cpu/vulnerabilities/tsx_async_abort <==
Not affected
Grub Boot options are: crashkernel=896M rd.lvm.lv=cl/root net.ifnames=0
biosdevname=0 scsi_mod.use_blk_mq=1 dm_mod.use_blk_mq=y mitigations=off
console=tty0 console=ttyS1,115200
-Toke
_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat