Re: [driver-discuss] A question: can be avoid using ´bcopy´ in Tx of the NIC driver?

Brian Xu - Sun Microsystems - Beijing China Mon, 02 Mar 2009 23:35:07 -0800

Garrett D'Amore wrote:

Brian Xu - Sun Microsystems - Beijing China wrote:
Garrett D'Amore wrote:
Brian Xu - Sun Microsystems - Beijing China wrote:
Hi there,
I have a question here:
Why all of the NIC drivers have to bcopy the MBLKs for transmit?(some of them bcopy always, and some others bcopy under a thresholdof the packet length).
I think one of the reason is the overhead of the setup of dma onthe fly is greater than the overhead of bcopy for short packets. Iwant to know if this is the case and if there are any other reasons.
Yes. For any packet reasonably sized bcopy (ETHERMTU or smaller) isfaster on *all* recent hardware. (This is confirmed on even anolder 300MHz Via C3.) (Hmm... I've heard that for some Niagrasystems this might not be true, however. But I've not tested itmyself.)
Even with bcopy, there is still need a pre-binded dma resource. Sothe threshold of the bcopy size is based on whether the overhead fordma bind on the fly is greater than the threshold of the bcopy to apre-binded dma address. For the hardware itself, it only know DMA isneeded.
The pre-bound DMA setup you pay at attach() time, and doesn't play arole. So you have to compare the cost of bcopy() vs. the cost ofddi_dma_addr_setup().

It is really what I meant.

There is a lot of additional complexity for tx as well, because youhave to deal with the fact that packets may cross page boundaries andrequire multiple DMA cookies. This adds a lot of complexity, and notall drivers can deal well with multiple descriptors per packet.

Just as what we do for ddi_dma_buf_bind_handle, the shadow page listrecords all the mapped physical pages. so you don't have to worry aboutthe cross of page boundary.

I still don´t know if there are other reasons other than the overheadof dma setup.
Complexity. There are various concerns, as a race with _fini() andesballoc (for the rx path), involved.
Also you have to worry about alignment. Not all hardware can transmitarbitrarily aligned packets. With all the work you wind up doing tomake this work correctly, you get very little performance benefit. Soits rarely worth the pain and suffering. For regular MTU frames, itjust isn't worth it, ever. On reasonably modern hardware, anyway.

For the alignment, does how large packet transmit (dma bind on the fly)does is OK, I think.


Thanks,
Brian

For rx, you can eliminate a lot of the DMA costs by recyclingbuffers. But the complexity to do this "well" without introducingpotential panics is high. Almost every driver that has tried hasgotten this wrong at some point. Some of them are still wrong.
   -- Garrett
Thanks,
Brian
I think the situation is different with jumbo frames, though.
If what I guess is the major cause, I have a proposal and I want tohear your advice whether it makes sense.
The most time-consuming action for the dma setup is the dma bind,more specific, calling into the VM layer to get the PFN for thevaddr(hat_getpfnum()), since it need to search the huge page table.While for the MBLKs, essentially which are slab objects, the PFNhas already been determined in the slab layer, and for most oftheir usage, we only touch the magazine layer, where the PFN is apre determined one. That is, the PFN should be considered as aconstructed state, but we don't leverage it for dma bind.
In storage, we have a field 'b_shadow' in buf(9S) to store thepages which are recently used, through which the PFNs can beeasily got. so inthe case that b_shadow works, ddi_dma_buf_bind_handle() is muchfaster than the ddi_dma_mem_bind_handle().Another example, moving the dma bind of the HBA driver(mpt) from Txpath to the kmem cache constrcutor, mpt driver got 26% throughputincrement. See CR6707308.
If the mblk could store the PFN info and we had addi_dma_mblk_bind_handle() like interface, then I think it willbenefit the performance of the NIC drivers. I consulted the PAE,and got a answer that the bcopy is typically about 10-15% of a NICTX workload.
There are things that can do to make DMA faster, better, andsimpler. In an ideal world, the GLDv3 could do most of this work,and the mblk could just carry the ddi_dma_cookie with it.
   -- Garrett
Thanks,
Brian

_______________________________________________
driver-discuss mailing list
driver-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/driver-discuss
_______________________________________________
driver-discuss mailing list
driver-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/driver-discuss


_______________________________________________
driver-discuss mailing list
driver-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Re: [driver-discuss] A question: can be avoid using ´bcopy´ in Tx of the NIC driver?

Reply via email to