[crossbow-discuss] transmit side hashing for scaling?

Nitin Hande Wed, 13 May 2009 11:33:30 -0700

Andrew Gallatin wrote:
> Nitin Hande wrote:
>> Andrew Gallatin wrote:
>>> Hi,
>>>
>>> Can somebody shed some light on how crossbow hashes outgoing
>>> packets to different transmit rings (not ring groups)?
>>>
>>> My 10GbE driver has multiple rings (and a single group).  Each
>>> transmit ring shares an interrupt with a corresponding receive
>>> ring.  We call a set of 1 TX ring, 1 RX ring, and interrupt handler
>>> state a "slice".  Transmit completions are handled from the interrupt
>>> handler.   On OSes which support multiple transmit routes,
>>> we've found that ensuring that a particular connection is always
>>> hashed to the same slice by the host and the NIC helps quite a bit
>>> with performance (improves CPU locality, reduces cache misses, 
>>> decreases
>>> power consumption).
>>>
>>> Some OSes (like FreeBSD) allow a driver to assist in tagging a
>>> connection so as to ensure that it is easy to hash
>>> traffic for the same connection into the same slice in the host
>>> and the NIC.  Others (Linux, S10) allow the driver to hash the
>>> outgoing packets to provide this locality.
>>>
>>> So.. Where is the transmit hashing done in crossbow?  Is it tunable?
>>> Is there a hook where I can do provide a hash routine (like Linux)?
>>> Can I tag packets (like FreeBSD)?  Is it at least something standard
>>> like Toeplitz?
>>
>> If your driver has advertised multiple tx rings, then look for 
>> mac_tx_fanout_mode() which in turn computes the hash on fanout hint 
>> passed from ip. Providing hooks for additional hash routines has been 
>> suggested.
>
> I guess my best bet might be to lie, and say I have only one TX
> ring, then fanout things myself, like I used to before Crossbow.
> Is there any non-obvious disadvantage to that?


If you advertise a single ring, then the tx path will end up in 
mac_tx_single_ring_mode() , they way it does for an e1000g driver. I 
think in that case the entry point in the driver is through older 
xxx_m_tx(), you may have to pay attention to that in your driver. There 
could be slight variance in both the schemes. In case of 
single_ring_mode(), if you get backpressured from the driver on the tx 
side due to lack of descriptors, then packets will be enqueued at the tx 
srs. At that point, if there are multiple  threads trying to send 
additional packets, all the packets will end up getting queued, whereas 
there will be only one worker thread trying to clear up the queue 
build-up. At high packet rates its difficult for this one thread to 
catch up (Additionally also look at MAC_DROP_ON_NO_DESC flag in 
.mac_tx_srs_no_desc() which can drop the packets rather than queuing). 
Versus in mac_tx_fanout_mode() each tx ring gets its own softring in 
case of backpressure and its own worker thread.
>
> When looking at this, I noticed mac_tx_serializer_mode().  Am I reading
> this right, in that is serializes a single queue?  That seems lacking,
> compared to the nxge_serialize stuff it replaces.

Yes. This part was done for nxge and as far as I remember recent 
performance of this scheme was very close to  that of the previous 
scheme. I think Gopi can comment more on this. What part do  you  think 
is missing here ?


Nitin

>
> Drew

[crossbow-discuss] transmit side hashing for scaling?

Reply via email to