Re: [E1000-devel] DMA mapping type and its performance impact

Alexander Duyck Fri, 25 May 2012 09:44:11 -0700

William,

Based on the fact things are dropping to 0 it sounds like you might be
losing interrupts.  We have code that will re-trigger the interrupts
once every 2 seconds to deal with platforms that may occasionally lose
an MSI-X interrupt.  That could be one reason you are seeing it recover
after a second or so.  Try commenting out the ixgbe_irq_rearm_queues
call in the ixgbe_check_hang_subtask.  If the adapter completely stalls
and doesn't recover then the issue is lost interrupts and may be a signs
of problems with the MR-IOV environment.


The other thing that you might want to check for would be to determine
if your test is using UDP or TCP.  Typically for an issue like this I
would recommend running with UDP in order to guarantee something like a
dropped acknowledgement doesn't stall the stream.

If you still see issues after that the only other possibility I can
think of would be a problem with the DMA flow to/from the device.

Thanks,

Alex

On 05/23/2012 10:19 PM, William Tu wrote:
> Hi Alex,
>
> Thanks for the suggestion! It turns out that the overhead of skb_copy
> and netdev_alloc_skb is because I turned on the kernel debugging
> option for SLUB memory allocator (CONFIG_SLUB_DEBUG). That's why I got
> an extremely longer memory allocation time, which slows down my RX
> throughput!
>
> In our case, we are trying to deliver a software-based MR-SRIOV
> system. We run the PF driver on one host (H1) and multiple VF drivers
> on another host (H2). Between H1 and H2, there is a memory
> sharing/interrupt forwarding device for H2 VF to communicate with H1
> PF.
>
> Right now my RX performance is achieving 9G but is a little bit unstable:
> * About every 10 seconds the throughput is dropped to almost zero and
> resume full speed again. Does anyone run into this issue before? Or
> any suggestions are appreciated!
>
> [  3] local 192.168.1.4 port 35451 connected with 192.168.1.21 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0- 1.0 sec   428 MBytes  3.59 Gbits/sec
> [  3]  1.0- 2.0 sec  0.00 Bytes  0.00 bits/sec
> [  3]  2.0- 3.0 sec  1.00 GBytes  8.62 Gbits/sec
> [  3]  3.0- 4.0 sec  1.07 GBytes  9.21 Gbits/sec
> [  3]  4.0- 5.0 sec  1.09 GBytes  9.38 Gbits/sec
> [  3]  5.0- 6.0 sec  1.09 GBytes  9.35 Gbits/sec
> [  3]  6.0- 7.0 sec  1.09 GBytes  9.38 Gbits/sec
> [  3]  7.0- 8.0 sec  1.09 GBytes  9.38 Gbits/sec
> [  3]  8.0- 9.0 sec  1.07 GBytes  9.16 Gbits/sec
> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec          --> drop to 0 bps
> [  3] 10.0-11.0 sec  1.01 GBytes  8.71 Gbits/sec
> [  3] 11.0-12.0 sec  1.09 GBytes  9.38 Gbits/sec
> [  3] 12.0-13.0 sec  1.09 GBytes  9.33 Gbits/sec
> [  3] 13.0-14.0 sec  1.09 GBytes  9.36 Gbits/sec
> [  3] 14.0-15.0 sec  1.09 GBytes  9.39 Gbits/sec
> [  3] 15.0-16.0 sec  1.09 GBytes  9.38 Gbits/sec
> [  3] 16.0-17.0 sec  1.09 GBytes  9.32 Gbits/sec
> [  3] 17.0-18.0 sec  1.09 GBytes  9.40 Gbits/sec
> [  3] 18.0-19.0 sec   295 MBytes  2.47 Gbits/sec
> [  3] 19.0-20.0 sec  0.00 Bytes  0.00 bits/sec      --> drop to 0 bps
> [  3] 20.0-21.0 sec  1.03 GBytes  8.80 Gbits/sec
> [  3] 21.0-22.0 sec  1.09 GBytes  9.39 Gbits/sec
> [  3] 22.0-23.0 sec  1.09 GBytes  9.36 Gbits/sec
> [  3] 23.0-24.0 sec  1.09 GBytes  9.38 Gbits/sec
> [  3] 24.0-25.0 sec  81.9 MBytes   687 Mbits/sec
> [  3] 25.0-26.0 sec  0.00 Bytes  0.00 bits/sec      --> drop to 0 bps
> [  3] 26.0-27.0 sec  1.02 GBytes  8.80 Gbits/sec
>
> Thanks a lot!
> William
>
> On Wed, May 23, 2012 at 12:08 AM, Alexander Duyck
> <[email protected]> wrote:
>> On 05/22/2012 05:43 AM, William Tu wrote:
>>> Hey guys,
>>>
>>> I'm William Tu from Stony Brook University. I'm currently working on
>>> an ixgbevf driver. Due to some special requirements, I need to
>>> pre-allocate a pool of  contiguous RX and TX buffer (4MB total in my
>>> case). I chopped the pool into multiple pages and assigned one-by-one
>>> to the RX and TX ring buffer. I also  implemented a bitmap to manage
>>> the free/allocation of this DMA pool.
>>>
>>> When packet is coming, the ixgbevf device DMA the packet into the RX
>>> buffer. Then my modified version of ixgbevf driver needs to do an
>>> "skb_copy" to copy the whole packet out of the pre-allocated pool so
>>> that the Linux kernel later on can free this copied skb and the buffer
>>> in the pre-allocated pool can be freed. Same ideal in the case of
>>> transmission.
>>>
>>> Everything works fine until I found a poor reception performance. I
>>> got TX: 9.4Gbps and RX: 1Gbps. I looked into the problem and found my
>>> driver spent quite a long time in doing
>>> 1. skb_copy in ixgbevf_clean_rx_irq and
>>> 2. netdev_alloc_skb_ip_align (in ixgbevf_alloc_rx_buffers).
>>>
>>> Compared with original ixgbevf code, I found most of the drivers are
>>> using dma_map_single/dma_unmap_single, which is streaming DMA
>>> mappings. However, I'm using coherent DMA mapping (dma_alloc_coherent)
>>> to allocate a big DMA buffer and assigning each piece to the RX ring.
>>> I'm wondering the performance impact of using dma_alloc_coherent, and
>>> is it possible that my poor performance is caused by this?
>>>
>>>
>>> Thanks a lot!
>>> William
>>>
>> Hi William,
>>
>> It sounds like you are taking on quite a bit of overhead with the
>> skb_copy and netdev allocation calls.  You may want to consider finding
>> a means of reducing that overhead.
>>
>> What you are describing for Rx doesn't sound too different from the
>> current ixgbe receive path.  For the ixgbe receive path we are using
>> pages that we mapped as a streaming DMA, however instead of un-mapping
>> them after the receive is complete we are simply calling
>> dma_sync_single_range_for_cpu on the half we received the packet in and
>> calling dma_sync_single_range_for_device on the half we are going to
>> give back to the device.  This essentially allows us to mimic a coherent
>> style mapping and to hold on the the page for an extended period of
>> time.  To avoid most of the overhead for having a locked down buffer we
>> are using the page to store the data section of the frames, and only
>> storing the packet header in the skb->data portion.  This allows us to
>> reuse buffers with minimal overhead for doing so versus the copying
>> approach you described.  The code for ixgbe to do this is in either the
>> 3.4 kernel, or our latest ixgbe driver available on e1000.sf.net.
>>
>> Thanks,
>>
>> Alex


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] DMA mapping type and its performance impact

Reply via email to