I've benchmarked if_gre and if_gif encapsulation on not-so-powerful hardware. Test is very simple: single as-fast-as-possible TCP stream via tunnel to much-more performant server over 1G physical interface.
I was surprised by very high price of encapsulation with standard MTU
of 1280 bytes. Performance in such configuration is about 45% of line
speed (440Mbit instead of 1Gbit). I can not believe, that adding one
header to each packet is SO expensive.
Also, in such case userland thread of benchmark (nuttcp or iperf3)
consumes 100% of CPU (one core) and kernel thread (if_config_tqg_0 or
if_config_tqg_1) consumes 100% of CPY (second core).
Physical interface with MTU of 1280 and switched off hardware "helpers"
shows 800Mbit/s and CPU is not-so-consumed, userland and kernel consume
about 50% + 50% of CPU, not 100% + 100%, which looks reasonable.
I've traced down half of problem: userland CPU consumption to lock
contention at "sys/netinet/tcp_usrreq.c:923":
inp = sotoinpcb(so);
KASSERT(inp != NULL, ("tcp_usr_send: inp == NULL"));
********INP_WLOCK(inp);********
if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
This line consumes 75% of all time consumed by sosend(), and,
effectively, by userland, via this call chain:
kernel`sosend+0x79
kernel`soo_write+0x6b
kernel`fo_write+0x4a
kernel`dofilewrite+0xcd
kernel`kern_writev+0x79
kernel`sys_write+0x8f
75% of one core is consumed by this line.
Unfortunately, other party of this contention is not so obvious.
Flame graph without TCO could be found here:
http://lev.serebryakov.spb.ru/_sklad/gif-stacks/no-tco/gif.1280.nuttcp.send.no-tco.svg
It looks very suspicious to me.
BTW, if_gre has exactly same problem.
--
// Lev Serebryakov
signature.asc
Description: OpenPGP digital signature
