On 01.10.2010 12:01, Sriram Gorti wrote:
Hi,

In the following is an observation when testing our XLR/XLS network
driver with 16 concurrent instances of netperf on FreeBSD-CURRENT.
Based on this observation, I have a question on which I hope to get
some understanding from here.

When running 16 concurrent netperf instances (each for about 20
seconds), it was found that after some number of runs performance
degraded badly (almost by a factor of 5). All subsequent runs remained
so. Started debugging this from TCP-side as other driver tests were
doing fine for comparably long durations on same board+s/w.

netstat indicated the following:

$ netstat -s -f inet -p tcp | grep discarded
                 0 discarded for bad checksums
                 0 discarded for bad header offset fields
                 0 discarded because packet too short
                 7318 discarded due to memory problems

Then, traced the "discarded due to memory problems" to the following counter:

$ sysctl -a net.inet.tcp.reass
net.inet.tcp.reass.overflows: 7318
net.inet.tcp.reass.maxqlen: 48
net.inet.tcp.reass.cursegments: 1594<--- // corresponds to
V_tcp_reass_qsize variable
net.inet.tcp.reass.maxsegments: 1600

Our guess for the need for reassembly (in this low-packet-loss test
setup) was the lack of per-flow classification in the driver, causing
it to spew incoming packets across the 16 h/w cpus instead of packets
of a flow being sent to the same cpu. While we are working on
addressing this driver limitation, debugged further to see how/why the
V_tcp_reass_qsize grew (assuming that out-of-order segments should
have dropped to zero at the end of the run). It was seen that this
counter was actually growing up from the initial runs but only when it
reached near to maxsgements, perf degradation was seen. Then, started
looking at vmstat also to see how many of the reassembly segments were
lost. But, there were no segments lost. We could not reconcile "no
lost segments" with "growth of this counter across test runs".

A patch is in the works to properly autoscale the reassembly queue
and should be comitted shortly.

$ sysctl net.inet.tcp.reass ; vmstat -z | egrep "FREE|mbuf|tcpre"
net.inet.tcp.reass.overflows: 0
net.inet.tcp.reass.maxqlen: 48
net.inet.tcp.reass.cursegments: 147
net.inet.tcp.reass.maxsegments: 1600
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
mbuf_packet:            256,      0,    4096,    3200, 5653833,   0,   0
mbuf:                   256,      0,       1,    2048, 4766910,   0,   0
mbuf_cluster:          2048,  25600,    7296,       6,    7297,   0,   0
mbuf_jumbo_page:       4096,  12800,       0,       0,       0,   0,   0
mbuf_jumbo_9k:         9216,   6400,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384,   3200,       0,       0,       0,   0,   0
mbuf_ext_refcnt:          4,      0,       0,       0,       0,   0,   0
tcpreass:                20,   1690,       0,     845, 1757074,   0,   0

In view of these observations, my question is: is it possible for the
V_tcp_reass_qsize variable to be unsafely updated on SMP ? (The
particular flavor of XLS that was used in the test had 4 cores with 4
h/w threads/core). I see that the tcp_reass function assumes some lock
is taken but not sure if it is the per-socket or the global tcp lock.

The updating of the global counter is indeed unsafe and becomes obsolete
with the autotuning patch.

The patch is reviewed by me and ready for commit.  However lstewart@ is
currently writing his thesis and has only very little spare time.  I'll
send you the patch in private email so you can continue your testing.

--
Andre
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to