Re: [E1000-devel] DMA mapping type and its performance impact

William Tu Fri, 01 Jun 2012 00:05:52 -0700

Hi Greg,

After checking a few functions, I found my VF is getting an indication
of a PF reset.


The ixgbevf_watchdog_task periodically calling check_link function,
which is implemented by ixgbe_check_mac_link_vf. This function then
calls the ixgbe_check_for_rst_vf. The check_for_rst_vf always returns
zero, meaning the PF has set the reset done bit. So
ixgbe_check_mac_link_vf sets link_up=false, speed=0, and returns -1
back to watchdog_task, which will call netif_carrier_off and
netif_tx_stop_all_queues.....

My question is that since I did not reset my PF periodically, why does
my VF getting a PF reset? (PF reset means PF itself has been reset or
PF has reset the VF?)

Thank you!


Regards,
William

On Tue, May 29, 2012 at 11:49 PM, Greg Rose <[email protected]> wrote:
> On Mon, 28 May 2012 21:30:44 +0800
> William Tu <[email protected]> wrote:
>
>> Hey Alex,
>>
>> Thank you for pointing out the two possible functions. It turns out
>> that the problem of periodically dropping the throughput to zero is
>> caused by ixgbevf_watchdog_task. Due to some unknown reasons, my VF is
>> taken down and this watchdog_task function is bringing up my VF driver
>> every 2 seconds. I guess there is some PF-VF communications missing so
>> that PF tries to disable my VF. I have to check my MR-SRIOV
>> implementation.
>>
>> In ixgbevf_watchdog_task:
>>
>> if ((hw->mac.ops.check_link(hw, &link_speed,
>>                         &link_up, false)) != 0) {
>>             adapter->link_up = link_up;
>>             adapter->link_speed = link_speed;
>>             netif_carrier_off(netdev);
>>             netif_tx_stop_all_queues(netdev);
>>             schedule_work(&adapter->reset_task);
>>             goto pf_has_reset;
>> ...
>> ...
>> pf_has_reset:
>>     /* Reset the timer */
>>     if (!test_bit(__IXGBEVF_DOWN, &adapter->state))
>>         mod_timer(&adapter->watchdog_timer,
>>             round_jiffies(jiffies + (2 * HZ)));
>>
>> I solve this issue by increasing the frequency of mod_timer to 200ms:
>>   mod_timer(&adapter->watchdog_timer, 0.2 * HZ);
>> and I got a smooth 9.3Gbps TCP RX throughput.
>
> It looks like the VF driver isn't getting an indication link is up
> through reading of the VFLINKS register or that it is getting an
> indication of a PF reset?
>
> Something to check for anyway.
>
> - Greg
>
>
>>
>>
>> Thanks again.
>> William
>>
>> On Sat, May 26, 2012 at 12:42 AM, Alexander Duyck
>> <[email protected]> wrote:
>> > William,
>> >
>> > Based on the fact things are dropping to 0 it sounds like you might
>> > be losing interrupts.  We have code that will re-trigger the
>> > interrupts once every 2 seconds to deal with platforms that may
>> > occasionally lose an MSI-X interrupt.  That could be one reason you
>> > are seeing it recover after a second or so.  Try commenting out the
>> > ixgbe_irq_rearm_queues call in the ixgbe_check_hang_subtask.  If
>> > the adapter completely stalls and doesn't recover then the issue is
>> > lost interrupts and may be a signs of problems with the MR-IOV
>> > environment.
>> >
>> > The other thing that you might want to check for would be to
>> > determine if your test is using UDP or TCP.  Typically for an issue
>> > like this I would recommend running with UDP in order to guarantee
>> > something like a dropped acknowledgement doesn't stall the stream.
>> >
>> > If you still see issues after that the only other possibility I can
>> > think of would be a problem with the DMA flow to/from the device.
>> >
>> > Thanks,
>> >
>> > Alex
>> >
>> > On 05/23/2012 10:19 PM, William Tu wrote:
>> >> Hi Alex,
>> >>
>> >> Thanks for the suggestion! It turns out that the overhead of
>> >> skb_copy and netdev_alloc_skb is because I turned on the kernel
>> >> debugging option for SLUB memory allocator (CONFIG_SLUB_DEBUG).
>> >> That's why I got an extremely longer memory allocation time, which
>> >> slows down my RX throughput!
>> >>
>> >> In our case, we are trying to deliver a software-based MR-SRIOV
>> >> system. We run the PF driver on one host (H1) and multiple VF
>> >> drivers on another host (H2). Between H1 and H2, there is a memory
>> >> sharing/interrupt forwarding device for H2 VF to communicate with
>> >> H1 PF.
>> >>
>> >> Right now my RX performance is achieving 9G but is a little bit
>> >> unstable:
>> >> * About every 10 seconds the throughput is dropped to almost zero
>> >> and resume full speed again. Does anyone run into this issue
>> >> before? Or any suggestions are appreciated!
>> >>
>> >> [  3] local 192.168.1.4 port 35451 connected with 192.168.1.21
>> >> port 5001 [ ID] Interval       Transfer     Bandwidth
>> >> [  3]  0.0- 1.0 sec   428 MBytes  3.59 Gbits/sec
>> >> [  3]  1.0- 2.0 sec  0.00 Bytes  0.00 bits/sec
>> >> [  3]  2.0- 3.0 sec  1.00 GBytes  8.62 Gbits/sec
>> >> [  3]  3.0- 4.0 sec  1.07 GBytes  9.21 Gbits/sec
>> >> [  3]  4.0- 5.0 sec  1.09 GBytes  9.38 Gbits/sec
>> >> [  3]  5.0- 6.0 sec  1.09 GBytes  9.35 Gbits/sec
>> >> [  3]  6.0- 7.0 sec  1.09 GBytes  9.38 Gbits/sec
>> >> [  3]  7.0- 8.0 sec  1.09 GBytes  9.38 Gbits/sec
>> >> [  3]  8.0- 9.0 sec  1.07 GBytes  9.16 Gbits/sec
>> >> [  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec          --> drop
>> >> to 0 bps [ 3] 10.0-11.0 sec  1.01 GBytes  8.71 Gbits/sec
>> >> [  3] 11.0-12.0 sec  1.09 GBytes  9.38 Gbits/sec
>> >> [  3] 12.0-13.0 sec  1.09 GBytes  9.33 Gbits/sec
>> >> [  3] 13.0-14.0 sec  1.09 GBytes  9.36 Gbits/sec
>> >> [  3] 14.0-15.0 sec  1.09 GBytes  9.39 Gbits/sec
>> >> [  3] 15.0-16.0 sec  1.09 GBytes  9.38 Gbits/sec
>> >> [  3] 16.0-17.0 sec  1.09 GBytes  9.32 Gbits/sec
>> >> [  3] 17.0-18.0 sec  1.09 GBytes  9.40 Gbits/sec
>> >> [  3] 18.0-19.0 sec   295 MBytes  2.47 Gbits/sec
>> >> [  3] 19.0-20.0 sec  0.00 Bytes  0.00 bits/sec      --> drop to 0
>> >> bps [  3] 20.0-21.0 sec  1.03 GBytes  8.80 Gbits/sec
>> >> [  3] 21.0-22.0 sec  1.09 GBytes  9.39 Gbits/sec
>> >> [  3] 22.0-23.0 sec  1.09 GBytes  9.36 Gbits/sec
>> >> [  3] 23.0-24.0 sec  1.09 GBytes  9.38 Gbits/sec
>> >> [  3] 24.0-25.0 sec  81.9 MBytes   687 Mbits/sec
>> >> [  3] 25.0-26.0 sec  0.00 Bytes  0.00 bits/sec      --> drop to 0
>> >> bps [  3] 26.0-27.0 sec  1.02 GBytes  8.80 Gbits/sec
>> >>
>> >> Thanks a lot!
>> >> William
>> >>
>> >> On Wed, May 23, 2012 at 12:08 AM, Alexander Duyck
>> >> <[email protected]> wrote:
>> >>> On 05/22/2012 05:43 AM, William Tu wrote:
>> >>>> Hey guys,
>> >>>>
>> >>>> I'm William Tu from Stony Brook University. I'm currently
>> >>>> working on an ixgbevf driver. Due to some special requirements,
>> >>>> I need to pre-allocate a pool of  contiguous RX and TX buffer
>> >>>> (4MB total in my case). I chopped the pool into multiple pages
>> >>>> and assigned one-by-one to the RX and TX ring buffer. I also
>> >>>>  implemented a bitmap to manage the free/allocation of this DMA
>> >>>> pool.
>> >>>>
>> >>>> When packet is coming, the ixgbevf device DMA the packet into
>> >>>> the RX buffer. Then my modified version of ixgbevf driver needs
>> >>>> to do an "skb_copy" to copy the whole packet out of the
>> >>>> pre-allocated pool so that the Linux kernel later on can free
>> >>>> this copied skb and the buffer in the pre-allocated pool can be
>> >>>> freed. Same ideal in the case of transmission.
>> >>>>
>> >>>> Everything works fine until I found a poor reception
>> >>>> performance. I got TX: 9.4Gbps and RX: 1Gbps. I looked into the
>> >>>> problem and found my driver spent quite a long time in doing
>> >>>> 1. skb_copy in ixgbevf_clean_rx_irq and
>> >>>> 2. netdev_alloc_skb_ip_align (in ixgbevf_alloc_rx_buffers).
>> >>>>
>> >>>> Compared with original ixgbevf code, I found most of the drivers
>> >>>> are using dma_map_single/dma_unmap_single, which is streaming DMA
>> >>>> mappings. However, I'm using coherent DMA mapping
>> >>>> (dma_alloc_coherent) to allocate a big DMA buffer and assigning
>> >>>> each piece to the RX ring. I'm wondering the performance impact
>> >>>> of using dma_alloc_coherent, and is it possible that my poor
>> >>>> performance is caused by this?
>> >>>>
>> >>>>
>> >>>> Thanks a lot!
>> >>>> William
>> >>>>
>> >>> Hi William,
>> >>>
>> >>> It sounds like you are taking on quite a bit of overhead with the
>> >>> skb_copy and netdev allocation calls.  You may want to consider
>> >>> finding a means of reducing that overhead.
>> >>>
>> >>> What you are describing for Rx doesn't sound too different from
>> >>> the current ixgbe receive path.  For the ixgbe receive path we
>> >>> are using pages that we mapped as a streaming DMA, however
>> >>> instead of un-mapping them after the receive is complete we are
>> >>> simply calling dma_sync_single_range_for_cpu on the half we
>> >>> received the packet in and calling
>> >>> dma_sync_single_range_for_device on the half we are going to give
>> >>> back to the device.  This essentially allows us to mimic a
>> >>> coherent style mapping and to hold on the the page for an
>> >>> extended period of time.  To avoid most of the overhead for
>> >>> having a locked down buffer we are using the page to store the
>> >>> data section of the frames, and only storing the packet header in
>> >>> the skb->data portion.  This allows us to reuse buffers with
>> >>> minimal overhead for doing so versus the copying approach you
>> >>> described.  The code for ixgbe to do this is in either the 3.4
>> >>> kernel, or our latest ixgbe driver available on e1000.sf.net.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Alex
>> >
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond.
>> Discussions will include endpoint security, mobile security and the
>> latest in malware threats.
>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________ E1000-devel mailing
>> list [email protected]
>> https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> To learn more about Intel&#174; Ethernet, visit
>> http://communities.intel.com/community/wired
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] DMA mapping type and its performance impact

Reply via email to