On Mon, 2 Aug 2010, Brett Glass wrote:
http://www.computerworld.com/s/article/9180022/Latest_Linux_kernel_uses_Google_made_protocols
describes SMP optimizations to the Linux kernel (the article mistakenly
calls them "protocols," but they're not) which steer the processing of
incoming network packets to the CPU core that is running the process for
which they're destined. (Doing this requires code which straddles network
layers in interesting ways.) The article claims that these optimizations are
Google's invention, though they simply seem like a common sense way to make
the best use of CPU cache.
The article claims dramatic performance improvements due to this
optimization. Anything like this in the works for FreeBSD?
Quite a few systems do things like this, although perhaps not the exact
formula that Google has. For example, Solarflare's TCP onload engine
programms their hardware to direct5-tuples to specific queues for use by
specific processes. Likewise, Chelsio's recenetly committed TCAM programming
code allows work to be similarly directed to specific queues (and generally
CPUs), although not in a way tightly integrated with the network stack.
I'm currently doing some work for Juniper to add affinity features up and down
the stack. Right now my prototype does this with RSS but doesn't attempt to
expose specific flow affinity to userspace, or allow userspace to direct
affinity. I have some early hacks at socket options to do that, although my
goal was to perform flow direction in hardware (i.e., have the network stack
program the TCAM on the T3 cards) rather than do the redirection in software.
However, some recent experiments I ran that did work distribution to the
per-CPU netisr workers I added in FreeBSD 8 were surprisingly effective -- not
as good as distribution in hardware, but still significantly more throughput
on an 8-core system (in this case I used RSS hashes generated by the
hardware).
Adding some sort of software redirection affinity table wouldn't be all that
difficult, but I'll continue to focus on hardware distribution for the time
being -- several cards out there will support the model pretty nicely. The
only real limitations there are (a) which cards support it -- not Intel NICs,
I'm afraid and (b) the sizes of the hardware flow direction tables -- usually
in the thousands to tends of thousands range.
Robert
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-chat
To unsubscribe, send any mail to "[email protected]"