Andrew,

Thanks for the additional info. We'd like to verify that interrupts  
are getting disabled from interrupt context itself. If you don't mind,  
could you gather an aggregation of the callers of  
mac_hwring_disable_intr() during one of your runs? You should be able  
to do this with "dtrace -n fbt::  
mac_hwring_disable_intr:entry'{...@[stack()] = count()}'"

Thanks,
Nicolas.

On May 6, 2009, at 6:22 PM, Andrew Gallatin wrote:

> Nicolas Droux wrote:
>> On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:
>>> Nicolas Droux wrote:
>>>> [Bcc'ed driver-discuss at opensolaris.org and networking-discuss at 
>>>> opensolaris.org 
>>>> ]
>>>> I am pleased to announce the availability of the first revision  
>>>> of the "Crossbow APIs for Device Drivers" document, available at  
>>>> the following location:
>>>
>>> I recently ported a 10GbE driver to Crossbow.  My driver currently
>>> has a single ring-group, and a configurable number of rings.  The
>>> NIC hashes received traffic to the rings in hardware.
>>>
>>>
>>> I'm having a strange issue which I do not see in the non-crossbow
>>> version of the driver.  When I run TCP benchmarks, I'm seeing
>>> what seems like packet loss.  Specifically, netstat shows
>>> tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
>>> and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
>>> on the same box with the same OS revision).
>>>
>>> The first thing I suspected was that packets were getting dropped
>>> due to my having the wrong generation number, but a dtrace probe
>>> doesn't show any drops there.
>>>
>>> Now I'm wondering if perhaps the interupt handler is in
>>> the middle of a call to mac_rx_ring() when interrupts
>>> are disabled. Am I supposed to ensure that my interrupt handler is  
>>> not
>>> calling mac_rx_ring() before my rx_ring_intr_disable()
>>> routine returns?  Or does the mac layer serialize this?
>> Can you reproduce the problem with only one RX ring enabled? If so,
>
> Yes, easily.
>
>> something to try would be to bind the poll thread to the same CPU  
>> as the MSI for that single RX ring. To find the CPU the MSI is  
>> bound to, run ::interrupts from mdb, then assign the CPU to use for  
>> the poll thread by doing a "dladm setlinkprop -p cpus=<cpuid>  
>> <link>".
>
> That helps quite a bit.  For comparison, with no binding at all, it  
> looks like this: (~1Gb/s)
>
> TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
>        tcpRtoMax           = 60000     tcpMaxConn          =    -1
>        tcpActiveOpens      =     0     tcpPassiveOpens     =     0
>        tcpAttemptFails     =     0     tcpEstabResets      =     0
>        tcpCurrEstab        =     5     tcpOutSegs          = 17456
>        tcpOutDataSegs      =    21     tcpOutDataBytes     =  2272
>        tcpRetransSegs      =     0     tcpRetransBytes     =     0
>        tcpOutAck           = 17435     tcpOutAckDelayed    =     0
>        tcpOutUrg           =     0     tcpOutWinUpdate     =     0
>        tcpOutWinProbe      =     0     tcpOutControl       =     0
>        tcpOutRsts          =     0     tcpOutFastRetrans   =     0
>        tcpInSegs           =124676
>        tcpInAckSegs        =    21     tcpInAckBytes       =  2272
>        tcpInDupAck         =   412     tcpInAckUnsent      =     0
>        tcpInInorderSegs    =122654     tcpInInorderBytes   =175240560
>        tcpInUnorderSegs    =   125     tcpInUnorderBytes   =152184
>        tcpInDupSegs        =   412     tcpInDupBytes       =590976
>        tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
>        tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
>        tcpInWinProbe       =     0     tcpInWinUpdate      =     0
>        tcpInClosed         =     0     tcpRttNoUpdate      =     0
>        tcpRttUpdate        =    21     tcpTimRetrans       =     0
>        tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
>        tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
>        tcpListenDrop       =     0     tcpListenDropQ0     =     0
>        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0
>
> After doing the binding, I'm seeing less out-of-order
> packets.  netstat -s -P tcp 1 now looks like this: (~4Gb/s)
>
>
>
> TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
>        tcpRtoMax           = 60000     tcpMaxConn          =    -1
>        tcpActiveOpens      =     0     tcpPassiveOpens     =     0
>        tcpAttemptFails     =     0     tcpEstabResets      =     0
>        tcpCurrEstab        =     5     tcpOutSegs          = 46865
>        tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
>        tcpRetransSegs      =     0     tcpRetransBytes     =     0
>        tcpOutAck           = 46869     tcpOutAckDelayed    =     0
>        tcpOutUrg           =     0     tcpOutWinUpdate     =    19
>        tcpOutWinProbe      =     0     tcpOutControl       =     0
>        tcpOutRsts          =     0     tcpOutFastRetrans   =     0
>        tcpInSegs           =372387
>        tcpInAckSegs        =     3     tcpInAckBytes       =  1600
>        tcpInDupAck         =    33     tcpInAckUnsent      =     0
>        tcpInInorderSegs    =372264     tcpInInorderBytes   =527482971
>        tcpInUnorderSegs    =    14     tcpInUnorderBytes   = 18806
>        tcpInDupSegs        =    33     tcpInDupBytes       = 46591
>        tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
>        tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
>        tcpInWinProbe       =     0     tcpInWinUpdate      =     0
>        tcpInClosed         =     0     tcpRttNoUpdate      =     0
>        tcpRttUpdate        =     3     tcpTimRetrans       =     0
>        tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
>        tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
>        tcpListenDrop       =     0     tcpListenDropQ0     =     0
>        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0
>
> And the old version of the driver, which does not deal with the new
> crossbow interfaces:
>
> TCP     tcpRtoAlgorithm     =     0     tcpRtoMin           =   400
>        tcpRtoMax           = 60000     tcpMaxConn          =    -1
>        tcpActiveOpens      =     0     tcpPassiveOpens     =     0
>        tcpAttemptFails     =     0     tcpEstabResets      =     0
>        tcpCurrEstab        =     5     tcpOutSegs          = 55231
>        tcpOutDataSegs      =     3     tcpOutDataBytes     =  1600
>        tcpRetransSegs      =     0     tcpRetransBytes     =     0
>        tcpOutAck           = 55228     tcpOutAckDelayed    =     0
>        tcpOutUrg           =     0     tcpOutWinUpdate     =   465
>        tcpOutWinProbe      =     0     tcpOutControl       =     0
>        tcpOutRsts          =     0     tcpOutFastRetrans   =     0
>        tcpInSegs           =438394
>        tcpInAckSegs        =     3     tcpInAckBytes       =  1600
>        tcpInDupAck         =     0     tcpInAckUnsent      =     0
>        tcpInInorderSegs    =438392     tcpInInorderBytes   =617512374
>        tcpInUnorderSegs    =     0     tcpInUnorderBytes   =     0
>        tcpInDupSegs        =     0     tcpInDupBytes       =     0
>        tcpInPartDupSegs    =     0     tcpInPartDupBytes   =     0
>        tcpInPastWinSegs    =     0     tcpInPastWinBytes   =     0
>        tcpInWinProbe       =     0     tcpInWinUpdate      =     0
>        tcpInClosed         =     0     tcpRttNoUpdate      =     0
>        tcpRttUpdate        =     3     tcpTimRetrans       =     0
>        tcpTimRetransDrop   =     0     tcpTimKeepalive     =     0
>        tcpTimKeepaliveProbe=     0     tcpTimKeepaliveDrop =     0
>        tcpListenDrop       =     0     tcpListenDropQ0     =     0
>        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =     0
>
>
>
>> There might be a race between the poll thread and the thread trying  
>> to deliver the chain through mac_rx_ring() from interrupt context,  
>> since we currently don't rebind the MSIs to the same CPUs as their  
>> corresponding poll threads. We are planning to do the rebinding of  
>> MSIs, but we are depending on interrupt rebinding APIs which are  
>> still being worked on. The experiment above would allow us to  
>> confirm whether it the issue seen here or if need to look somewhere  
>> else.
>> BTW, which ONNV build are you currently using?
>
> SunOS dell1435a 5.11 snv_111a i86pc i386 i86pc
>
> This is an OpenSolaris 2009.06 machine (updated from
> snv_84).
>
> I can try BFU'ing a different machine to a later build
> (once I've BFU'ed it to 111a so as to repro it there).
> I'm traveling, and wouldn't have a chance to try that
> test until next week.
>
>
> BTW, did you see my earlier message on networking-discuss
> (http://mail.opensolaris.org/pipermail/networking-discuss/2009-April/010979.html
>  
> )
> That was with the pre-crossbow version of the driver.
>
> Cheers,
>
> Drew

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
nicolas.droux at sun.com - http://blogs.sun.com/droux


Reply via email to