Hi,

I won't repeat what David said, he's completely right. I'll respond
to the point below :

> Are there specific traffic profiles which benefit most from the use of
> splicing? As I mentioned earlier I'm not seeing any performance improvement
> while using it - in fact I've seen degradation when transferring large files,
> which suggests to me that I'm possibly not using it appropriately.

It depends a lot on network hardware and drivers. At 10 Gbps, you have almost
no choice, and in fact it works extremely well. On gigabit NICs, I've seen
mixed results. Sometimes you'll see only single TCP segments get returned
by a splice() call, resulting in a lot more calls than what recv() would do,
thus showing lower performance. This has improved a lot with kernels around
2.6.27 because that was when we started experimenting with splice() and the
developers were very reactive to fix some issues. I still had this case
recently with a 2.6.32.x on an ARM box (the crappy guruplug server I bought).
Splice() was around 10% slower than standard recv(). With 2.6.35.x, it's the
opposite, splice() has become about 10% faster than recv(), so something has
improved.

For splice() to be efficient, you need a NIC that supports LRO or at least a
kernel that supports GRO. I recall an exchange I had with one guy at Zeus
when we were debugging splice(). He was observing significantly lower
performance on a quad-port intel gigabit card with splice() than with recv().
After the fixes he got very similar results, which means that splice() did
not help at all with this card and his version of the driver.

You should play a bit with ethtool. First, check that your NIC supports
TSO, checksum offloads and that GRO is enabled (ethtool -k). Second, see
with "ethtool -c" if you can reduce the interrupt rate or increase the
RX delay so that the NIC has some chances to merge multiple packets in
the receive path. Obviously you'll need a decent NIC anyway. Don't expect
anything from a realtek or nforce ;-)

>From my tests, Myricom's Myri10GE NICs benefit a lot from splice(). That's
the NIC I use on the 10Gbps tests at only 25% CPU. Another guy I know has
got very good results with intel's 10 GE NICs too. I've see very low CPU
figures at up to 5 Gbps production traffic. I remember having noticed a
small improvement on the old PCI-based TG3 NIC on my old notebook (pentium-M
at 1.7 GHz). I haven't done enough tests on e1000 since we got splice().

All in all, at gigabit speeds on decent hardware, the improvements should
be minimal, as we're only talking about memory avoiding copies at 125 MB/s.
Still on small CPU-bound or FSB-bound hardware, it can be a nice improvement.

Regards,
Willy


Reply via email to