> -----Original Message-----
> From: discuss [mailto:[email protected]] On Behalf Of Daniele
> Di Proietto
> Sent: Thursday, July 16, 2015 8:03 PM
> To: Stokes, Ian
> Cc: [email protected]
> Subject: Re: [ovs-discuss] OVS segmentation fault due to incorrect TX queue
> setup with netdev-dpdk
> 
> Hi,
> 
> Thanks for the very detailed report!  I've sent two
> patches to the mailing list that should address the
> issues.  Would you mind testing them?
> 
> More comments inline
> 
> On 15/07/2015 11:49, "Stokes, Ian" <[email protected]> wrote:
> 
> >Hi All,
> >
> >I¹ve been investigating a segmentation fault caused by the incorrect
> >setup of TX queues for netdev-dpdk. It occurs in the following scenario.
> >
> >Running OVS with DPDK on a system with 72 cores (Hyper threading enabled)
> >and using an Intel XL710 Network Card.
> >
> >Default behavior in OVS when adding a DPDK physical port is to attempt to
> >setup 1 tx queue for each core detected on the system plus one more queue
> >for non_pmd threads.
> >
> >In this case 73 tx queues will be requested in total.
> >
> >The standard behavior when initializing a DPDK port is to check the
> >number of queues being requested against the max number of queues
> >available for the device itself.
> >
> >This is done in dpdk_eth_dev_init() with the following code segment
> >Š
> >    rte_eth_dev_info_get(dev->port_id, &info);
> >    dev->up.n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> >    dev->real_n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >
> >    diag = rte_eth_dev_configure(dev->port_id, dev->up.n_rxq,
> >dev->real_n_txq, &port_conf);
> >Š
> >
> >The smaller of the two values is selected as the real amount of tx queues
> >that can be setup. This accommodates a situation where we could have more
> >cores on a system than we have tx queues on the network device in DPDK.
> >
> >This has worked fine with the previous generation of Intel interfaces
> >such as Intel 82599. However it will not work with the XL710.
> >
> >In DPDK the XL710 has a total of 316 tx queues that can be used. From the
> >check above we would think we can allocate 73 of these tx queues without
> >issue. But the 316 queues available are subdivided between different
> >queue types.
> >
> >For a DPDK host application (In this case OVS) queues 1 ­ 64 inclusive
> >can be used. However queue 65 to 96 are strictly for SRIOV tx queue use.
> >
> >
> >The check for max_tx_queues above will identify the total number of
> >queues available (316), compare it to the number of queues being
> >requested (73) and will select 73 as the real_n_txq. But this is not the
> >correct number of tx queues that are usable by
> > OVS (64).
> >
> >We can cause the switch to segfault by doing the following
> >
> >Add a dpdk physical port
> >
> >sudo $OVS_DIR/utilities/ovs-vsctl add-br br0 -- set Bridge br0
> >datapath_type=netdev
> >sudo $OVS_DIR/utilities/ovs-vsctl add-port br0 dpdk0 -- set Interface
> >dpdk0 type=dpdk
> >
> >This will output the following warning
> >ovs-vsctl: Error detected while setting up 'dpdk0'.  See ovs-vswitchd log
> >for details.
> >
> >Looking at the log we see
> >
> >PMD: i40e_dev_tx_queue_setup(): Using simple tx path
> >PMD: i40e_pf_get_vsi_by_qindex(): queue_idx out of range. VMDQ configured?
> >2015-07-15T01:22:48Z|00019|dpdk|ERR|eth dev tx queue setup error -5
> >2015-07-15T01:22:48Z|00020|dpif_netdev|ERR|dpdk0, cannot set multiq
> >2015-07-15T01:22:48Z|00021|dpif|WARN|netdev@ovs-netdev: failed to add
> >dpdk0 as port: Resource temporarily unavailable
> >
> >This is as expected. This warning will be reported in dpdk_eth_dev_init()
> >by the following code segment when it attempts to initialize the 65th
> >queue
> >
> >    for (i = 0; i < dev->real_n_txq; i++) {
> >        diag = rte_eth_tx_queue_setup(dev->port_id, i, NIC_PORT_TX_Q_SIZE,
> >                                      dev->socket_id, NULL);
> >        if (diag) {
> >            VLOG_ERR("eth dev tx queue setup error %d",diag);
> >            return -diag;
> >        }
> >    }
> >
> >Then add an internal port type to the same bridge
> >
> >sudo $OVS_DIR/utilities/ovs-vsctl add-port br0 testif1 -- set interface
> >testif1 type=internal
> >
> >I was surprised to see that after adding the internal port , the DPDK
> >port that failed previously is now added as well. Is this expected
> >behavior?
> 
> No, this is a genuine OVS bug.  The first patch in the series
> addresses that.
> 
> >Looking at the vswitch log I can see that both the internal port and the
> >DPDK port have port IDs now.
> >
> >2015-07-15T01:23:19Z|00024|bridge|INFO|bridge br0: added interface
> >testif1 on port 1
> >2015-07-15T01:23:19Z|00025|dpif_netdev|INFO|Created 1 pmd threads on numa
> >node 0
> >2015-07-15T01:23:19Z|00001|dpif_netdev(pmd40)|INFO|Core 0 processing port
> >'dpdk0'
> >2015-07-15T01:23:19Z|00002|dpif_netdev(pmd40)|INFO|Core 0 processing port
> >'dpdk0'
> >2015-07-15T01:23:19Z|00026|bridge|INFO|bridge br0: added interface dpdk0
> >on port 2
> >2015-07-15T01:23:19Z|00027|bridge|INFO|bridge br0: using datapath ID
> >00006805ca2d3cb8
> >
> >If we assign an IP to the internal port we will segfault the vswitch
> >sudo ip addr add 192.168.1.1/24 dev testif1
> >
> >This is caused by the internal interface broadcasting an ICMP6 neighbor
> >solicitation message. This packet is copied from kernel space memory to
> >DPDK memory in the netdev_dpdk_send__() function.
> >The issue is that the qid passed to netdev_dpdk_send__ function is 72.
> >This packet will eventually be transmitted with with rte_eth_tx_burst
> >with a tx qid of 72.
> >In DPDK, queue 72 for the XL710 is for SRIOV use only and so will not be
> >initialized during the rte_eth_tx_queue_setup process above and so the
> >switch segfaults when an attempt is made to access it.
> >
> >
> >In terms of a solution to this I would appreciate some feedback on what
> >people think is the best approach.
> >
> >Ideally DPDK could extend the number of sequential queues supported for
> >host DPDK applications.
> >Previous generation cards supported 128 TX queues that could be used with
> >a host application, hence why this issue is not seen with them.
> >
> >This however would not fix the immediate issue and would be more of a
> >long term solution. It could be flagged in the documentation as a known
> >issue/corner case that is not supported in the mean time.
> >
> >Alternatively OVS could attempt to setup as many queues as possible on
> >the DPDK device itself. If an error is detected the appropriate fields
> >would have to be updated such as
> >dev->real_n_txq.
> >In this case we would setup 64 of the requested 73, log a warning message
> >to the user. However there may be issues with how the pmd threads map to
> >the correct tx queue IDs.
> >I¹ve noticed that when netdev_dpdk_send__ is called the qid is 72, and
> >this value comes from dp_execute_cb() where the tx_qid is taken from the
> >dp_netdev_pmd_thread.
> >
> >Any Feedback would be appreciated.
> 
> netdev-dpdk already supports working with a smaller of
> txqs by using a spinlock.
> 
> The ideal fix (IMHO) would be for DPDK to report the usable
> number of transmission queues.  In the meantime we can retry
> with less transmission queues if the queue setup fails.
> 
> I've implemented this workaroud in the second patch of the
> series.

Thanks. I agree, ideally we could get this information from DPDK. DPDK team
are aware of it and hopefully we can get a fix into R2.2

> 
> >
> >Thanks
> >Ian
> >
> >
> >
> >
> 
> _______________________________________________
> discuss mailing list
> [email protected]
> http://openvswitch.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to