rajagopal kunhappan wrote: > Andrew Gallatin wrote: >> Nicolas Droux wrote: >>> >>> On May 14, 2009, at 10:51 AM, Andrew Gallatin wrote: >>> >>>> Nitin Hande wrote: >>>>> Andrew Gallatin wrote: >>>> >>>>>> When looking at this, I noticed mac_tx_serializer_mode(). Am I >>>>>> reading >>>>>> this right, in that is serializes a single queue? That seems >>>>>> lacking, >>>>>> compared to the nxge_serialize stuff it replaces. >>>>> Yes. This part was done for nxge and as far as I remember recent >>>>> performance of this scheme was very close to that of the previous >>>>> scheme. I think Gopi can comment more on this. What part do you >>>>> think is missing here ? >>>> >>>> Perhaps I'm missing something.. Doesn't nxge support multiple TX >>>> rings? >>>> If so, does the existing serialization serialize all traffic to a >>>> single ring, or is mac_tx_serializer_mode() applied after >>>> mac_tx_fanout_mode()? >>>> >>>> I had thought the original nxge serializer serialized each TX ring >>>> separately in nxge. The fork I made of it for myri10ge certainly >>>> works that way. >>> >>> The serializer is only for use by the nxge driver which has an >>> inefficient TX path locking implementation. We didn't have the >>> resources to completely rewrite the nxge transmit path as part of the >>> Crossbow project so we moved the serialization implementation in MAC >>> for that driver. The serializer in MAC does serialization on a >>> per-ring basis. The serializer should not be used by any other driver. >> >> krgopi said, in an earlier reply "mac_tx_serializer_mode() is used >> when you have a single Tx ring. nxge would not use that mode". So I'm >> confused. From the source, it looks like nxge is using that mode >> (MAC_VIRT_SERIALIZE |'ed into mi_v12n_level). > > Hi Andrew, > > I think the confusion is in the name. mac_tx_serializer_mode() is used > when you have single ring. nxge exposes multiple rings. When multiple Tx > rings are present, mac_tx_fanout_mode() get called. In this mode, each > Tx ring will have a soft ring associated with it. The soft rings > themselves are stored in the Tx SRS. The packets coming into > mac_tx_fanout_mode() will get fanned out to one of the Tx soft rings and > mac_tx_soft_ring_process() gets called. mac_tx_soft_ring_process() can > either queue up the packets or send directly to the NIC. In the case of > nxge, packets get queued up in the soft ring and the soft ring worker > thread sends them to the NIC. > > Hope this clarifies. > >>> You don't have to use the serializer to support multiple TX rings. >>> Keep your TX path lean and mean, apply good design principles, e.g. >>> avoid holding locks for too long on your data-path, and you should be >>> fine. >> >> FWIW. my tx path is very "lean and mean". The only time >> locks are held are when writing the tx descriptors to the NIC, >> and when allocating a dma handle from a pre-allocated per-ring pool. >> >> I thought the serializer was silly too, but PAE claimed a speedup >> from it. I think that PAE claimed the speedup came from >> never back-pressuring the stack when the host overran the >> NIC. One of the "features" of the serializer was to always >> block the calling thread if the tx queue was exhausted. >> >> Have you done any packets-per-second benchmarks with your >> fanout code? I'm concerned that its very cache unfriendly > > With nxge we get line rate with MTU sized packets with 8 Tx rings. The > numbers are similar to what is was with nxge serializer in place. > >> if you have a long run of packets all going to the same >> destination. This is because you walk the mblk chain, reading >> the packet headers and queue up a big chain. If the chain >> gets too long, the mblk and/or the packet headers will be >> pushed out of cache by the time they make it to the driver's >> xmit routine. So in this case you could have twice as many >> cache misses as normal when things get really backed up. > > We would like to have the drivers operate in non-serialized mode. But if > for whatever reason, you want to use serialized mode, and there are > issues, we can look into that,
The only reason I care about the serializer is the pre-crossbow feedback from PAE that the original serializer avoided putting backpressure on the stack when the TX rings fill up. I'm happy using the normal fanout (with some caveats below) as long as PAE doesn't complain about it later. The caveats being that I want an fanout mode that uses a standard Toeplitz hash so as to maintain CPU locality. Or a hook so I can implement my own tx side hashing. Drew
