On 05/22/2012 05:43 AM, William Tu wrote:
> Hey guys,
>
> I'm William Tu from Stony Brook University. I'm currently working on
> an ixgbevf driver. Due to some special requirements, I need to
> pre-allocate a pool of  contiguous RX and TX buffer (4MB total in my
> case). I chopped the pool into multiple pages and assigned one-by-one
> to the RX and TX ring buffer. I also  implemented a bitmap to manage
> the free/allocation of this DMA pool.
>
> When packet is coming, the ixgbevf device DMA the packet into the RX
> buffer. Then my modified version of ixgbevf driver needs to do an
> "skb_copy" to copy the whole packet out of the pre-allocated pool so
> that the Linux kernel later on can free this copied skb and the buffer
> in the pre-allocated pool can be freed. Same ideal in the case of
> transmission.
>
> Everything works fine until I found a poor reception performance. I
> got TX: 9.4Gbps and RX: 1Gbps. I looked into the problem and found my
> driver spent quite a long time in doing
> 1. skb_copy in ixgbevf_clean_rx_irq and
> 2. netdev_alloc_skb_ip_align (in ixgbevf_alloc_rx_buffers).
>
> Compared with original ixgbevf code, I found most of the drivers are
> using dma_map_single/dma_unmap_single, which is streaming DMA
> mappings. However, I'm using coherent DMA mapping (dma_alloc_coherent)
> to allocate a big DMA buffer and assigning each piece to the RX ring.
> I'm wondering the performance impact of using dma_alloc_coherent, and
> is it possible that my poor performance is caused by this?
>
>
> Thanks a lot!
> William
>
Hi William,

It sounds like you are taking on quite a bit of overhead with the
skb_copy and netdev allocation calls.  You may want to consider finding
a means of reducing that overhead.

What you are describing for Rx doesn't sound too different from the
current ixgbe receive path.  For the ixgbe receive path we are using
pages that we mapped as a streaming DMA, however instead of un-mapping
them after the receive is complete we are simply calling
dma_sync_single_range_for_cpu on the half we received the packet in and
calling dma_sync_single_range_for_device on the half we are going to
give back to the device.  This essentially allows us to mimic a coherent
style mapping and to hold on the the page for an extended period of
time.  To avoid most of the overhead for having a locked down buffer we
are using the page to store the data section of the frames, and only
storing the packet header in the skb->data portion.  This allows us to
reuse buffers with minimal overhead for doing so versus the copying
approach you described.  The code for ixgbe to do this is in either the
3.4 kernel, or our latest ixgbe driver available on e1000.sf.net.

Thanks,

Alex

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to