commit 96c6eb294d6698d9a56882102cb936d326b2a3c7 Author: Sepherosa Ziehau <se...@dragonflybsd.org> Date: Mon Sep 3 17:46:58 2012 +0800
tcp: Implement asynchronized pru_rcvd This mainly avoids extra scheduling cost on the reception path due to lwkt_domsg(). lwkt_sendmsg() is now used to carry out TCP pru_rcvd. Since TCP's pru_rcvd could be batched, one pru_rcvd netmsg is embedded into struct socket to avoid pru_rcvd netmsg allocation for each pru_rcvd, and this netmsg will be used by lwkt_sendmsg(). Whether this embedded pcu_rcvd netmsg should be sent or not is determined by its MSG_DONE bit. Since user thread and netisr thread could be on different CPUs, the embedded pru_rcvd netmsg's MSG_DONE bit is protected by a spinlock. To cope with the following race that could drop window updates, tcp_usr_rcvd() replies asynchronized rcvd netmsg before tcp_output(): netisr thread user thread tcp_usr_rcvd() sorcvtcp() { { tcp_output() : : : : sbunlinkmbuf() : if (rcvd & MSG_DONE) (2) : lwkt_sendmsg(rvcd) : : lwkt_replymsg(rcvd) (1) } At (2) window update is dropped, since rcvd netmsg is not replied yet at (1) The result: On i7-2600 (4C/8T, 3.4GHz): 32 parallel netperf -H 127.0.0.1 -t TCP_STREAM -P0 -l 30 (4 runs, unit: Mbps) old 30253.88 30242.58 30162.55 30101.51 new 33962.74 33798.70 33499.92 33482.35 This gives ~12% performance improvement. Summary of changes: sys/kern/uipc_msg.c | 35 +++++++++++++++++++++++++++++++++++ sys/kern/uipc_socket.c | 15 ++++++++++----- sys/kern/uipc_socket2.c | 3 +-- sys/net/netmsg.h | 3 +++ sys/netinet/in_proto.c | 3 ++- sys/netinet/tcp_subr.c | 3 +++ sys/netinet/tcp_usrreq.c | 11 +++++++++-- sys/netinet6/in6_proto.c | 2 +- sys/sys/protosw.h | 1 + sys/sys/socketops.h | 3 +++ sys/sys/socketvar.h | 9 ++++++++- 11 files changed, 76 insertions(+), 12 deletions(-) http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/96c6eb294d6698d9a56882102cb936d326b2a3c7 -- DragonFly BSD source repository