Nicolas Droux wrote:
> 
> On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:
> 
>> Nicolas Droux wrote:
>>> [Bcc'ed driver-discuss at opensolaris.org and 
>>> networking-discuss at opensolaris.org]
>>> I am pleased to announce the availability of the first revision of 
>>> the "Crossbow APIs for Device Drivers" document, available at the 
>>> following location:
>>
>> I recently ported a 10GbE driver to Crossbow.  My driver currently
>> has a single ring-group, and a configurable number of rings.  The
>> NIC hashes received traffic to the rings in hardware.
>>
>>
>> I'm having a strange issue which I do not see in the non-crossbow
>> version of the driver.  When I run TCP benchmarks, I'm seeing
>> what seems like packet loss.  Specifically, netstat shows
>> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
>> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
>> on the same box with the same OS revision).
>>
>> The first thing I suspected was that packets were getting dropped
>> due to my having the wrong generation number, but a dtrace probe
>> doesn't show any drops there.
>>
>> Now I'm wondering if perhaps the interupt handler is in
>> the middle of a call to mac_rx_ring() when interrupts
>> are disabled. Am I supposed to ensure that my interrupt handler is not
>> calling mac_rx_ring() before my rx_ring_intr_disable()
>> routine returns?  Or does the mac layer serialize this?
> 
> Can you reproduce the problem with only one RX ring enabled? If so, 

Yes, easily.

> something to try would be to bind the poll thread to the same CPU as the 
> MSI for that single RX ring. To find the CPU the MSI is bound to, run 
> ::interrupts from mdb, then assign the CPU to use for the poll thread by 
> doing a "dladm setlinkprop -p cpus=<cpuid> <link>".

That helps quite a bit.  For comparison, with no binding at all, it 
looks like this: (~1Gb/s)

TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
         tcpRtoMax           = 60000     tcpMaxConn          =    -1
         tcpActiveOpens      =     0     tcpPassiveOpens     =     0
         tcpAttemptFails     =     0     tcpEstabResets      =     0
         tcpCurrEstab        =     5     tcpOutSegs          = 17456
         tcpOutDataSegs      =    21     tcpOutDataBytes     =  2272
         tcpRetransSegs      =     0     tcpRetransBytes     =     0
         tcpOutAck           = 17435     tcpOutAckDelayed    =     0
         tcpOutUrg           =     0     tcpOutWinUpdate     =     0
         tcpOutWinProbe      =     0     tcpOutControl       =     0
         tcpOutRsts          =     0     tcpOutFastRetrans   =     0
         tcpInSegs           =124676
         tcpInAckSegs        =    21     tcpInAckBytes       =  2272
         tcpInDupAck         =   412     tcpInAckUnsent      =     0
         tcpInInorderSegs    =122654     tcpInInorderBytes   =175240560
         tcpInUnorderSegs    =   125     tcpInUnorderBytes   =152184
         tcpInDupSegs        =   412     tcpInDupBytes       =590976
         tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
         tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
         tcpInWinProbe       =     0     tcpInWinUpdate      =     0
         tcpInClosed         =     0     tcpRttNoUpdate      =     0
         tcpRttUpdate        =    21     tcpTimRetrans       =     0
         tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
         tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
         tcpListenDrop       =     0     tcpListenDropQ0     =     0
         tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0

After doing the binding, I'm seeing less out-of-order
packets.  netstat -s -P tcp 1 now looks like this: (~4Gb/s)



TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
         tcpRtoMax           = 60000     tcpMaxConn          =    -1
         tcpActiveOpens      =     0     tcpPassiveOpens     =     0
         tcpAttemptFails     =     0     tcpEstabResets      =     0
         tcpCurrEstab        =     5     tcpOutSegs          = 46865
         tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
         tcpRetransSegs      =     0     tcpRetransBytes     =     0
         tcpOutAck           = 46869     tcpOutAckDelayed    =     0
         tcpOutUrg           =     0     tcpOutWinUpdate     =    19
         tcpOutWinProbe      =     0     tcpOutControl       =     0
         tcpOutRsts          =     0     tcpOutFastRetrans   =     0
         tcpInSegs           =372387
         tcpInAckSegs        =     3     tcpInAckBytes       =  1600
         tcpInDupAck         =    33     tcpInAckUnsent      =     0
         tcpInInorderSegs    =372264     tcpInInorderBytes   =527482971
         tcpInUnorderSegs    =    14     tcpInUnorderBytes   = 18806
         tcpInDupSegs        =    33     tcpInDupBytes       = 46591
         tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
         tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
         tcpInWinProbe       =     0     tcpInWinUpdate      =     0
         tcpInClosed         =     0     tcpRttNoUpdate      =     0
         tcpRttUpdate        =     3     tcpTimRetrans       =     0
         tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
         tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
         tcpListenDrop       =     0     tcpListenDropQ0     =     0
         tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0

And the old version of the driver, which does not deal with the new
crossbow interfaces:

TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
         tcpRtoMax           = 60000     tcpMaxConn          =    -1
         tcpActiveOpens      =     0     tcpPassiveOpens     =     0
         tcpAttemptFails     =     0     tcpEstabResets      =     0
         tcpCurrEstab        =     5     tcpOutSegs          = 55231
         tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
         tcpRetransSegs      =     0     tcpRetransBytes     =     0
         tcpOutAck           = 55228     tcpOutAckDelayed    =     0
         tcpOutUrg           =     0     tcpOutWinUpdate     =   465
         tcpOutWinProbe      =     0     tcpOutControl       =     0
         tcpOutRsts          =     0     tcpOutFastRetrans   =     0
         tcpInSegs           =438394
         tcpInAckSegs        =     3     tcpInAckBytes       =  1600
         tcpInDupAck         =     0     tcpInAckUnsent      =     0
         tcpInInorderSegs    =438392     tcpInInorderBytes   =617512374
         tcpInUnorderSegs    =     0     tcpInUnorderBytes   =     0
         tcpInDupSegs        =     0     tcpInDupBytes       =     0
         tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
         tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
         tcpInWinProbe       =     0     tcpInWinUpdate      =     0
         tcpInClosed         =     0     tcpRttNoUpdate      =     0
         tcpRttUpdate        =     3     tcpTimRetrans       =     0
         tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
         tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
         tcpListenDrop       =     0     tcpListenDropQ0     =     0
         tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0



> There might be a race between the poll thread and the thread trying to 
> deliver the chain through mac_rx_ring() from interrupt context, since we 
> currently don't rebind the MSIs to the same CPUs as their corresponding 
> poll threads. We are planning to do the rebinding of MSIs, but we are 
> depending on interrupt rebinding APIs which are still being worked on. 
> The experiment above would allow us to confirm whether it the issue seen 
> here or if need to look somewhere else.
> 
> BTW, which ONNV build are you currently using?

SunOS dell1435a 5.11 snv_111a i86pc i386 i86pc

This is an OpenSolaris 2009.06 machine (updated from
snv_84).

I can try BFU'ing a different machine to a later build
(once I've BFU'ed it to 111a so as to repro it there).
I'm traveling, and wouldn't have a chance to try that
test until next week.


BTW, did you see my earlier message on networking-discuss
(http://mail.opensolaris.org/pipermail/networking-discuss/2009-April/010979.html)
That was with the pre-crossbow version of the driver.

Cheers,

Drew

Reply via email to