Yeah, I agree that at 1 Gbit you don't need multiple receive queues to get to line rate. In my 100Gbit tests, I got to 50 Gbps with CAKE (I should really post some graphs of that), so at really high speeds we would benefit from being able to run simultaneously on multiple CPUs. But let's just say that turning CAKE into something that can run on multiple CPUs simultaneously is non-trivial... :)
> In any case, the MQ qdisc simply sorts packets into hardware queues > according to the CPU they were submitted from. [...] But it's > basically useless on [...] a machine acting primarily as a router, > since the traffic is submitted from just one or two CPUs at a time, > and usually most of the CPUs are idle anyway. Not quite. On a router, the distribution of packets over CPUs will depend on what happens on the receive side. Usually, the hardware will have the same number of receive queues as transmit queues, and it will use Receive Side Scaling (RSS) which hashes packets into the queues based on the packet header. Often, the hardware queues are not assigned properly to different CPUs, which is why the first thing 10Gbit+ performance tuning guides tells you to do is to adjust the CPU mapping of the hardware queue IRQs... > I have no idea what the hardware does to coalesce those packets into a > single stream to be sent over the wire. That's hardware specific, but I think most devices do something that more or less corresponds to round-robin scheduling of the hardware queues. -Toke _______________________________________________ Cake mailing list [email protected] https://lists.bufferbloat.net/listinfo/cake
