Robert Chacon <[email protected]> writes: > Toke, > > Thank you very much for pointing me in the right direction. > I am having some fun in the lab tinkering with the 'mq' qdisc and Jesper's > xdp-cpumap-tc. > It seems I will need to use iptables or nftables to filter packets to > corresponding queues, since mq apparently cannot have u32 filters on its > root. > I will try to familiarize myself with iptables and nftables, and hopefully > get it working soon and report back. Thank you!
Cool - adding in Jesper, maybe he has some input on this :) -Toke > On Fri, Jan 15, 2021 at 5:30 AM Toke Høiland-Jørgensen <[email protected]> wrote: > >> Robert Chacon <[email protected]> writes: >> >> >> Cool! What kind of performance are you seeing? The README mentions being >> >> limited by the BPF hash table size, but can you actually shape 2000 >> >> customers on one machine? On what kind of hardware and at what rate(s)? >> > >> > On our production network our peak throughput is 1.5Gbps from 200 >> clients, >> > and it works very well. >> > We use a simple consumer-class AMD 2700X CPU in production because >> > utilization of the shaper VM is ~15% at 1.5Gbps load. >> > Customers get reliably capped within ±2Mbps of their allocated >> htb/fq_codel >> > bandwidth, which is very helpful to control network congestion. >> > >> > Here are some graphs from RRUL performed on our test bench hypervisor: >> > >> https://raw.githubusercontent.com/rchac/LibreQoS/main/docs/fq_codel_1000_subs_4G.png >> > In that example, bandwidth for the "subscriber" client VM was set to >> 4Gbps. >> > 1000 IPv4 IPs and 1000 IPv6 IPs were in the filter hash table of >> LibreQoS. >> > The test bench server has an AMD 3900X running Ubuntu in Proxmox. 4Gbps >> > utilizes 10% of the VM's 12 cores. Paravirtualized VirtIO network drivers >> > are used and most offloading types are enabled. >> > In our setup, VM networking multiqueue isn't enabled (it kept disrupting >> > traffic flow), so 6Gbps is probably the most it can achieve like this. >> Our >> > qdiscs in this VM may be limited to one core because of that. >> >> I suspect the issue you had with multiqueue is that it requires per-CPU >> partitioning on a per-customer base to work well. This is possible to do >> with XDP, as Jesper demonstrates here: >> >> https://github.com/netoptimizer/xdp-cpumap-tc >> >> With this it should be possible to scale the hardware queues across >> multiple CPUs properly, and you should be able to go to much higher >> rates by just throwing more CPU cores at it. At least on bare metal; not >> sure if the VM virt-drivers have the needed support yet... >> >> -Toke >> > > > -- > [image: photograph] > > > *Robert Chacón* Owner > *M* (915) 730-1472 > *E* [email protected] > *JackRabbit Wireless LLC* > P.O. Box 222111 > El Paso, TX 79913 > *jackrabbitwireless.com* <http://jackrabbitwireless.com> _______________________________________________ Bloat mailing list [email protected] https://lists.bufferbloat.net/listinfo/bloat
