Andrew,
Thanks for the additional info. We'd like to verify that interrupts
are getting disabled from interrupt context itself. If you don't mind,
could you gather an aggregation of the callers of
mac_hwring_disable_intr() during one of your runs? You should be able
to do this with "dtrace -n fbt::
mac_hwring_disable_intr:entry'{...@[stack()] = count()}'"
Thanks,
Nicolas.
On May 6, 2009, at 6:22 PM, Andrew Gallatin wrote:
Nicolas Droux wrote:
On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:
Nicolas Droux wrote:
[Bcc'ed [email protected] and [email protected]
]
I am pleased to announce the availability of the first revision
of the "Crossbow APIs for Device Drivers" document, available at
the following location:
I recently ported a 10GbE driver to Crossbow. My driver currently
has a single ring-group, and a configurable number of rings. The
NIC hashes received traffic to the rings in hardware.
I'm having a strange issue which I do not see in the non-crossbow
version of the driver. When I run TCP benchmarks, I'm seeing
what seems like packet loss. Specifically, netstat shows
tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
on the same box with the same OS revision).
The first thing I suspected was that packets were getting dropped
due to my having the wrong generation number, but a dtrace probe
doesn't show any drops there.
Now I'm wondering if perhaps the interupt handler is in
the middle of a call to mac_rx_ring() when interrupts
are disabled. Am I supposed to ensure that my interrupt handler is
not
calling mac_rx_ring() before my rx_ring_intr_disable()
routine returns? Or does the mac layer serialize this?
Can you reproduce the problem with only one RX ring enabled? If so,
Yes, easily.
something to try would be to bind the poll thread to the same CPU
as the MSI for that single RX ring. To find the CPU the MSI is
bound to, run ::interrupts from mdb, then assign the CPU to use for
the poll thread by doing a "dladm setlinkprop -p cpus=<cpuid>
<link>".
That helps quite a bit. For comparison, with no binding at all, it
looks like this: (~1Gb/s)
TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens = 0 tcpPassiveOpens = 0
tcpAttemptFails = 0 tcpEstabResets = 0
tcpCurrEstab = 5 tcpOutSegs = 17456
tcpOutDataSegs = 21 tcpOutDataBytes = 2272
tcpRetransSegs = 0 tcpRetransBytes = 0
tcpOutAck = 17435 tcpOutAckDelayed = 0
tcpOutUrg = 0 tcpOutWinUpdate = 0
tcpOutWinProbe = 0 tcpOutControl = 0
tcpOutRsts = 0 tcpOutFastRetrans = 0
tcpInSegs =124676
tcpInAckSegs = 21 tcpInAckBytes = 2272
tcpInDupAck = 412 tcpInAckUnsent = 0
tcpInInorderSegs =122654 tcpInInorderBytes =175240560
tcpInUnorderSegs = 125 tcpInUnorderBytes =152184
tcpInDupSegs = 412 tcpInDupBytes =590976
tcpInPartDupSegs = 0 tcpInPartDupBytes = 0
tcpInPastWinSegs = 0 tcpInPastWinBytes = 0
tcpInWinProbe = 0 tcpInWinUpdate = 0
tcpInClosed = 0 tcpRttNoUpdate = 0
tcpRttUpdate = 21 tcpTimRetrans = 0
tcpTimRetransDrop = 0 tcpTimKeepalive = 0
tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0
After doing the binding, I'm seeing less out-of-order
packets. netstat -s -P tcp 1 now looks like this: (~4Gb/s)
TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens = 0 tcpPassiveOpens = 0
tcpAttemptFails = 0 tcpEstabResets = 0
tcpCurrEstab = 5 tcpOutSegs = 46865
tcpOutDataSegs = 3 tcpOutDataBytes = 1600
tcpRetransSegs = 0 tcpRetransBytes = 0
tcpOutAck = 46869 tcpOutAckDelayed = 0
tcpOutUrg = 0 tcpOutWinUpdate = 19
tcpOutWinProbe = 0 tcpOutControl = 0
tcpOutRsts = 0 tcpOutFastRetrans = 0
tcpInSegs =372387
tcpInAckSegs = 3 tcpInAckBytes = 1600
tcpInDupAck = 33 tcpInAckUnsent = 0
tcpInInorderSegs =372264 tcpInInorderBytes =527482971
tcpInUnorderSegs = 14 tcpInUnorderBytes = 18806
tcpInDupSegs = 33 tcpInDupBytes = 46591
tcpInPartDupSegs = 0 tcpInPartDupBytes = 0
tcpInPastWinSegs = 0 tcpInPastWinBytes = 0
tcpInWinProbe = 0 tcpInWinUpdate = 0
tcpInClosed = 0 tcpRttNoUpdate = 0
tcpRttUpdate = 3 tcpTimRetrans = 0
tcpTimRetransDrop = 0 tcpTimKeepalive = 0
tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0
And the old version of the driver, which does not deal with the new
crossbow interfaces:
TCP tcpRtoAlgorithm = 0 tcpRtoMin = 400
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens = 0 tcpPassiveOpens = 0
tcpAttemptFails = 0 tcpEstabResets = 0
tcpCurrEstab = 5 tcpOutSegs = 55231
tcpOutDataSegs = 3 tcpOutDataBytes = 1600
tcpRetransSegs = 0 tcpRetransBytes = 0
tcpOutAck = 55228 tcpOutAckDelayed = 0
tcpOutUrg = 0 tcpOutWinUpdate = 465
tcpOutWinProbe = 0 tcpOutControl = 0
tcpOutRsts = 0 tcpOutFastRetrans = 0
tcpInSegs =438394
tcpInAckSegs = 3 tcpInAckBytes = 1600
tcpInDupAck = 0 tcpInAckUnsent = 0
tcpInInorderSegs =438392 tcpInInorderBytes =617512374
tcpInUnorderSegs = 0 tcpInUnorderBytes = 0
tcpInDupSegs = 0 tcpInDupBytes = 0
tcpInPartDupSegs = 0 tcpInPartDupBytes = 0
tcpInPastWinSegs = 0 tcpInPastWinBytes = 0
tcpInWinProbe = 0 tcpInWinUpdate = 0
tcpInClosed = 0 tcpRttNoUpdate = 0
tcpRttUpdate = 3 tcpTimRetrans = 0
tcpTimRetransDrop = 0 tcpTimKeepalive = 0
tcpTimKeepaliveProbe= 0 tcpTimKeepaliveDrop = 0
tcpListenDrop = 0 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans = 0
There might be a race between the poll thread and the thread trying
to deliver the chain through mac_rx_ring() from interrupt context,
since we currently don't rebind the MSIs to the same CPUs as their
corresponding poll threads. We are planning to do the rebinding of
MSIs, but we are depending on interrupt rebinding APIs which are
still being worked on. The experiment above would allow us to
confirm whether it the issue seen here or if need to look somewhere
else.
BTW, which ONNV build are you currently using?
SunOS dell1435a 5.11 snv_111a i86pc i386 i86pc
This is an OpenSolaris 2009.06 machine (updated from
snv_84).
I can try BFU'ing a different machine to a later build
(once I've BFU'ed it to 111a so as to repro it there).
I'm traveling, and wouldn't have a chance to try that
test until next week.
BTW, did you see my earlier message on networking-discuss
(http://mail.opensolaris.org/pipermail/networking-discuss/2009-April/010979.html
)
That was with the pre-crossbow version of the driver.
Cheers,
Drew
--
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
[email protected] - http://blogs.sun.com/droux
_______________________________________________
networking-discuss mailing list
[email protected]