Re: [RFC PATCH 0/5] net: low latency Ethernet device polling

Rick Jones Wed, 27 Feb 2013 11:58:19 -0800

On 02/27/2013 09:55 AM, Eliezer Tamir wrote:

This patchset adds the ability for the socket layer code to poll directly
on an Ethernet device's RX queue. This eliminates the cost of the interrupt
and context switch and with proper tuning allows us to get very close
to the HW latency.


This is a follow up to Jesse Brandeburg's Kernel Plumbers talk from last year
http://www.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-Low-Latency-Sockets-slides-brandeburg.pdf

Patch 1 adds ndo_ll_poll and the IP code to use it.
Patch 2 is an example of how TCP can use ndo_ll_poll.
Patch 3 shows how this method would be implemented for the ixgbe driver.
Patch 4 adds statistics to the ixgbe driver for ndo_ll_poll events.
(Optional) Patch 5 is a handy kprobes module to measure detailed latency
numbers.

this patchset is also available in the following git branch
git://github.com/jbrandeb/lls.git rfc

Performance numbers:
Kernel   Config     C3/6  rx-usecs  TCP  UDP
3.8rc6   typical    off   adaptive  37k  40k
3.8rc6   typical    off   0*        50k  56k
3.8rc6   optimized  off   0*        61k  67k
3.8rc6   optimized  on    adaptive  26k  29k
patched  typical    off   adaptive  70k  78k
patched  optimized  off   adaptive  79k  88k
patched  optimized  off   100       84k  92k
patched  optimized  on    adaptive  83k  91k
*rx-usecs=0 is usually not useful in a production environment.

I would think that latency-sensitive folks would be using rx-usecs=0 inproduction - at least if the NIC in use didn't have low enough latencywith its default interrupt coalescing/avoidance heuristics.

If I take the first "pure" A/B comparison it seems that the change asbenchmarked takes latency for TCP from ~27 usec (37k) to ~14 usec (70k).At what request/response size does the benefit taper-off? 13 usecseems to be about 16250 bytes at 10 GbE.

When I last looked at netperf TCP_RR performance where something similarcould happen I think it was IPoIB where it was possible to set things upsuch that polling happened rather than wakeups (perhaps it was with ashim library that converted netperf's socket calls to "native" IB). Myrecollection is that it "did a number" on the netperf service demandsthanks to the spinning. It would be a good thing to include thosefigures in any subsequent rounds of benchmarking.

Am I correct in assuming this is a mechanism which would not be used ina high aggregate PPS situation?


happy benchmarking,

rick jones
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/5] net: low latency Ethernet device polling

Reply via email to