Can you post : ethtool -i eth0 ethtool -k eth0
grep HZ /boot/config.... (what is the HZ value of your kernel) I suspect a possible problem with TSO autodefer when/if HZ < 1000 Thanks. On Thu, 2017-01-26 at 21:19 +0100, Hans-Kristian Bakke wrote: > There are two packet captures from fq with and without pacing here: > > > https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM > > > > The server (with fq pacing/nopacing) is 10.0.5.10 and is running a > Apache2 webserver at port tcp port 443. The tcp client is nginx > reverse proxy at 10.0.5.13 on the same subnet which again is proxying > the connection from the Windows 10 client. > - I did try to connect directly to the server with the client (via a > linux gateway router) avoiding the nginx proxy and just using plain > no-ssl http. That did not change anything. > - I also tried stopping the eth0 interface to force the traffic to the > eth1 interface in the LACP which changed nothing. > - I also pulled each of the cable on the switch to force the traffic > to switch between interfaces in the LACP link between the client > switch and the server switch. > > > The CPU is a 5-6 year old Intel Xeon X3430 CPU @ 4x2.40GHz on a > SuperMicro platform. It is not very loaded and the results are always > in the same ballpark with fq pacing on. > > > > top - 21:12:38 up 12 days, 11:08, 4 users, load average: 0.56, 0.68, > 0.77 > Tasks: 1344 total, 1 running, 1343 sleeping, 0 stopped, 0 zombie > %Cpu0 : 0.0 us, 1.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 > si, 0.0 st > %Cpu1 : 0.0 us, 0.3 sy, 0.0 ni, 97.4 id, 2.0 wa, 0.0 hi, 0.3 > si, 0.0 st > %Cpu2 : 0.0 us, 2.0 sy, 0.0 ni, 96.4 id, 1.3 wa, 0.0 hi, 0.3 > si, 0.0 st > %Cpu3 : 0.7 us, 2.3 sy, 0.0 ni, 94.1 id, 3.0 wa, 0.0 hi, 0.0 > si, 0.0 st > KiB Mem : 16427572 total, 173712 free, 9739976 used, 6513884 > buff/cache > KiB Swap: 6369276 total, 6126736 free, 242540 used. 6224836 avail > Mem > > > This seems OK to me. It does have 24 drives in 3 ZFS pools at 144TB > raw storage in total with several SAS HBAs that is pretty much always > poking the system in some way or the other. > > > There are around 32K interrupts when running @23 MB/s (as seen in > chrome downloads) with pacing on and about 25K interrupts when running > @105 MB/s with fq nopacing. Is that normal? > > > Hans-Kristian > > > > On 26 January 2017 at 20:58, David Lang <[email protected]> wrote: > Is there any CPU bottleneck? > > pacing causing this sort of problem makes me thing that the > CPU either can't keep up or that something (Hz setting type of > thing) is delaying when the CPU can get used. > > It's not clear from the posts if the problem is with sending > data or receiving data. > > David Lang > > > On Thu, 26 Jan 2017, Eric Dumazet wrote: > > Nothing jumps on my head. > > We use FQ on links varying from 1Gbit to 100Gbit, and > we have no such > issues. > > You could probably check on the server the TCP various > infos given by ss > command > > > ss -temoi dst <remoteip> > > > pacing rate is shown. You might have some issues, but > it is hard to say. > > > On Thu, 2017-01-26 at 19:55 +0100, Hans-Kristian Bakke > wrote: > After some more testing I see that if I > disable fq pacing the > performance is restored to the expected > levels: # for i in eth0 eth1; do tc qdisc > replace dev $i root fq nopacing; > done > > > Is this expected behaviour? There is some > background traffic, but only > in the sub 100 mbit/s on the switches and > gateway between the server > and client. > > > The chain: > Windows 10 client -> 1000 mbit/s -> switch -> > 2xgigabit LACP -> switch > -> 4 x gigabit LACP -> gw (fq_codel on all > nics) -> 4 x gigabit LACP > (the same as in) -> switch -> 2 x lacp -> > server (with misbehaving fq > pacing) > > > > On 26 January 2017 at 19:38, Hans-Kristian > Bakke <[email protected]> > wrote: > I can add that this is without BBR, > just plain old kernel 4.8 > cubic. > > On 26 January 2017 at 19:36, > Hans-Kristian Bakke > <[email protected]> wrote: > Another day, another fq issue > (or user error). > > > I try to do the seeminlig > simple task of downloading a > single large file over local > gigabit LAN from a > physical server running kernel > 4.8 and sch_fq on intel > server NICs. > > > For some reason it wouldn't go > past around 25 MB/s. > After having replaced SSL with > no SSL, replaced apache > with nginx and verified that > there is plenty of > bandwith available between my > client and the server I > tried to change qdisc from fq > to pfifo_fast. It > instantly shot up to around > the expected 85-90 MB/s. > The same happened with > fq_codel in place of fq. > > > I then checked the statistics > for fq and the throttled > counter is increasing > massively every second (eth0 and > eth1 is LACPed using Linux > bonding so both is seen > here): > > > qdisc fq 8007: root refcnt 2 > limit 10000p flow_limit > 100p buckets 1024 orphan_mask > 1023 quantum 3028 > initial_quantum 15140 > refill_delay 40.0ms > Sent 787131797 bytes 520082 > pkt (dropped 15, > overlimits 0 requeues 0) > backlog 98410b 65p requeues 0 > 15 flows (14 inactive, 1 > throttled) > 0 gc, 2 highprio, 259920 > throttled, 15 flows_plimit > qdisc fq 8008: root refcnt 2 > limit 10000p flow_limit > 100p buckets 1024 orphan_mask > 1023 quantum 3028 > initial_quantum 15140 > refill_delay 40.0ms > Sent 2533167 bytes 6731 pkt > (dropped 0, overlimits 0 > requeues 0) > backlog 0b 0p requeues 0 > 24 flows (24 inactive, 0 > throttled) > 0 gc, 2 highprio, 397 > throttled > > > Do you have any suggestions? > > > Regards, > Hans-Kristian > > > > > _______________________________________________ > Bloat mailing list > [email protected] > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > [email protected] > https://lists.bufferbloat.net/listinfo/bloat > > _______________________________________________ Bloat mailing list [email protected] https://lists.bufferbloat.net/listinfo/bloat
