Here are my current patches for comments. -- Richard Russo to...@enslaves.us
On Fri, Jul 5, 2019, at 12:23 PM, Richard Russo wrote: > Hi, > > I've been experimenting with Recieve Side Scaling (RSS) for a tcp proxy > application. The basic idea with RSS is by configuring the NICs, > kernel, and application to use the same CPU for a given socket, cross > CPU locking and communication is eliminated or at least significantly > reduced. On my system, configuring RSS allowed me to handle about three > times as many sessions before reaching CPU saturation, with the > remaining bottleneck seeming to be kernel processing around socket > creation and closing which requires cross cpu coordination. > > Aligning the incoming sockets is very simple, setting a socket option > (IP_RSS_LISTEN_BUCKET) on the listen socket restricts the accepted > socket to that bucket, and that's straight forward to add to the tcp > listener code, and configuration. > > Aligning outgoing sockets is trickier -- there's no kernel help with a > socket option or otherwise, an application has to run the hash > (toeplitz) on the 4-tuple of {local ip, local port, remote ip, remote > port } and only use an outgoing port if the hash matches. I've had > trouble finding a good approach to handle this. > > The simplest thing would be to run the hash when a port is assigned by > port_range and return the port if it hashes to the wrong bucket; but if > you've already used all the acceptable ports for that port range, you > spend a lot of time hashing the ports that are still in the range, > without making any progress. > > If you have a port range per rss bucket, you could hash on port > assignment, and not return the ports in case they hash to a wrong > bucket; but in the case that the remote ip changes because you've > configured it to use DNS or if you change the IP via "set server addr", > the previously computed hashes are no longer valid -- you would really > want to try all the ports again. > > What I ended up with was a lock on port ranges (instead of atomics as > used in 07425de71777b688e77a9c70a7088c13e66e41e9 BUG/MEDIUM: > port_range: Make the ring buffer lock-free), adding a revision counter > to the port range, and resetting the port range whenever the server IP > changed. To avoid running the hash during steady state, and because > checking all the ports when the range needs to be filled, I also made > port range filing incremental. > > This approach works, but it feels complicated, and it made my config > much more verbose --- I had to duplicate my frontend sections, one for > each RSS bucket, which sends to corresponding duplicated backends for > each bucket; the backends had additional configuration to indicate the > RSS bucket (and the number of buckets). Incidentally, because each RSS > bucket has a distinct set of ports, and because my use case doesn't use > any features which benefit from coordination within HAProxy (such as > stick tables etc), this makes it possible to run in process mode rather > than threaded mode without running into a lot of port already in use > warnings/errors that would happen otherwise when sharing a port range. > > If it's helpful for the discussion, I can share my patches as-is, but > if there are better ideas on how to structure this, I'd rather try to > get the changes done in a nice way before sharing. > > Thanks! > > -- > Richard Russo > to...@enslaves.us > >
0001-Allow-for-binding-listen-sockets-to-a-provided-RSS-b.patch
Description: Binary data
0002-Revert-BUG-MEDIUM-port_range-Make-the-ring-buffer-lo.patch
Description: Binary data
0003-add-port_range-locking-to-protect-against-concurrent.patch
Description: Binary data
0004-refill-port-ranges-when-addresses-change.patch
Description: Binary data
0005-Allow-for-RSS-aligned-port-selection-for-outgoing-co.patch
Description: Binary data