On May 6, 2009, at 5:49 AM, Andrew Gallatin wrote:
Nicolas Droux wrote:
[Bcc'ed [email protected] and [email protected]
]
I am pleased to announce the availability of the first revision of
the "Crossbow APIs for Device Drivers" document, available at the
following location:
I recently ported a 10GbE driver to Crossbow. My driver currently
has a single ring-group, and a configurable number of rings. The
NIC hashes received traffic to the rings in hardware.
I'm having a strange issue which I do not see in the non-crossbow
version of the driver. When I run TCP benchmarks, I'm seeing
what seems like packet loss. Specifically, netstat shows
tcpInUnorderBytes and tcpInDupBytes increasing at a rapid rate,
and bandwidth is terrible (~1Gb/s for crossbow, 7Gb/s non-crossbow
on the same box with the same OS revision).
The first thing I suspected was that packets were getting dropped
due to my having the wrong generation number, but a dtrace probe
doesn't show any drops there.
Now I'm wondering if perhaps the interupt handler is in
the middle of a call to mac_rx_ring() when interrupts
are disabled. Am I supposed to ensure that my interrupt handler is not
calling mac_rx_ring() before my rx_ring_intr_disable()
routine returns? Or does the mac layer serialize this?
Can you reproduce the problem with only one RX ring enabled? If so,
something to try would be to bind the poll thread to the same CPU as
the MSI for that single RX ring. To find the CPU the MSI is bound to,
run ::interrupts from mdb, then assign the CPU to use for the poll
thread by doing a "dladm setlinkprop -p cpus=<cpuid> <link>".
There might be a race between the poll thread and the thread trying to
deliver the chain through mac_rx_ring() from interrupt context, since
we currently don't rebind the MSIs to the same CPUs as their
corresponding poll threads. We are planning to do the rebinding of
MSIs, but we are depending on interrupt rebinding APIs which are still
being worked on. The experiment above would allow us to confirm
whether it the issue seen here or if need to look somewhere else.
BTW, which ONNV build are you currently using?
Nicolas.
Thanks,
Drew
--
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
[email protected] - http://blogs.sun.com/droux
_______________________________________________
networking-discuss mailing list
[email protected]