> -----Original Message----- > From: discuss [mailto:[email protected]] On Behalf Of Daniele > Di Proietto > Sent: Thursday, July 16, 2015 8:03 PM > To: Stokes, Ian > Cc: [email protected] > Subject: Re: [ovs-discuss] OVS segmentation fault due to incorrect TX queue > setup with netdev-dpdk > > Hi, > > Thanks for the very detailed report! I've sent two > patches to the mailing list that should address the > issues. Would you mind testing them? > > More comments inline > > On 15/07/2015 11:49, "Stokes, Ian" <[email protected]> wrote: > > >Hi All, > > > >I¹ve been investigating a segmentation fault caused by the incorrect > >setup of TX queues for netdev-dpdk. It occurs in the following scenario. > > > >Running OVS with DPDK on a system with 72 cores (Hyper threading enabled) > >and using an Intel XL710 Network Card. > > > >Default behavior in OVS when adding a DPDK physical port is to attempt to > >setup 1 tx queue for each core detected on the system plus one more queue > >for non_pmd threads. > > > >In this case 73 tx queues will be requested in total. > > > >The standard behavior when initializing a DPDK port is to check the > >number of queues being requested against the max number of queues > >available for the device itself. > > > >This is done in dpdk_eth_dev_init() with the following code segment > >Š > > rte_eth_dev_info_get(dev->port_id, &info); > > dev->up.n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); > > dev->real_n_txq = MIN(info.max_tx_queues, dev->up.n_txq); > > > > diag = rte_eth_dev_configure(dev->port_id, dev->up.n_rxq, > >dev->real_n_txq, &port_conf); > >Š > > > >The smaller of the two values is selected as the real amount of tx queues > >that can be setup. This accommodates a situation where we could have more > >cores on a system than we have tx queues on the network device in DPDK. > > > >This has worked fine with the previous generation of Intel interfaces > >such as Intel 82599. However it will not work with the XL710. > > > >In DPDK the XL710 has a total of 316 tx queues that can be used. From the > >check above we would think we can allocate 73 of these tx queues without > >issue. But the 316 queues available are subdivided between different > >queue types. > > > >For a DPDK host application (In this case OVS) queues 1 64 inclusive > >can be used. However queue 65 to 96 are strictly for SRIOV tx queue use. > > > > > >The check for max_tx_queues above will identify the total number of > >queues available (316), compare it to the number of queues being > >requested (73) and will select 73 as the real_n_txq. But this is not the > >correct number of tx queues that are usable by > > OVS (64). > > > >We can cause the switch to segfault by doing the following > > > >Add a dpdk physical port > > > >sudo $OVS_DIR/utilities/ovs-vsctl add-br br0 -- set Bridge br0 > >datapath_type=netdev > >sudo $OVS_DIR/utilities/ovs-vsctl add-port br0 dpdk0 -- set Interface > >dpdk0 type=dpdk > > > >This will output the following warning > >ovs-vsctl: Error detected while setting up 'dpdk0'. See ovs-vswitchd log > >for details. > > > >Looking at the log we see > > > >PMD: i40e_dev_tx_queue_setup(): Using simple tx path > >PMD: i40e_pf_get_vsi_by_qindex(): queue_idx out of range. VMDQ configured? > >2015-07-15T01:22:48Z|00019|dpdk|ERR|eth dev tx queue setup error -5 > >2015-07-15T01:22:48Z|00020|dpif_netdev|ERR|dpdk0, cannot set multiq > >2015-07-15T01:22:48Z|00021|dpif|WARN|netdev@ovs-netdev: failed to add > >dpdk0 as port: Resource temporarily unavailable > > > >This is as expected. This warning will be reported in dpdk_eth_dev_init() > >by the following code segment when it attempts to initialize the 65th > >queue > > > > for (i = 0; i < dev->real_n_txq; i++) { > > diag = rte_eth_tx_queue_setup(dev->port_id, i, NIC_PORT_TX_Q_SIZE, > > dev->socket_id, NULL); > > if (diag) { > > VLOG_ERR("eth dev tx queue setup error %d",diag); > > return -diag; > > } > > } > > > >Then add an internal port type to the same bridge > > > >sudo $OVS_DIR/utilities/ovs-vsctl add-port br0 testif1 -- set interface > >testif1 type=internal > > > >I was surprised to see that after adding the internal port , the DPDK > >port that failed previously is now added as well. Is this expected > >behavior? > > No, this is a genuine OVS bug. The first patch in the series > addresses that. > > >Looking at the vswitch log I can see that both the internal port and the > >DPDK port have port IDs now. > > > >2015-07-15T01:23:19Z|00024|bridge|INFO|bridge br0: added interface > >testif1 on port 1 > >2015-07-15T01:23:19Z|00025|dpif_netdev|INFO|Created 1 pmd threads on numa > >node 0 > >2015-07-15T01:23:19Z|00001|dpif_netdev(pmd40)|INFO|Core 0 processing port > >'dpdk0' > >2015-07-15T01:23:19Z|00002|dpif_netdev(pmd40)|INFO|Core 0 processing port > >'dpdk0' > >2015-07-15T01:23:19Z|00026|bridge|INFO|bridge br0: added interface dpdk0 > >on port 2 > >2015-07-15T01:23:19Z|00027|bridge|INFO|bridge br0: using datapath ID > >00006805ca2d3cb8 > > > >If we assign an IP to the internal port we will segfault the vswitch > >sudo ip addr add 192.168.1.1/24 dev testif1 > > > >This is caused by the internal interface broadcasting an ICMP6 neighbor > >solicitation message. This packet is copied from kernel space memory to > >DPDK memory in the netdev_dpdk_send__() function. > >The issue is that the qid passed to netdev_dpdk_send__ function is 72. > >This packet will eventually be transmitted with with rte_eth_tx_burst > >with a tx qid of 72. > >In DPDK, queue 72 for the XL710 is for SRIOV use only and so will not be > >initialized during the rte_eth_tx_queue_setup process above and so the > >switch segfaults when an attempt is made to access it. > > > > > >In terms of a solution to this I would appreciate some feedback on what > >people think is the best approach. > > > >Ideally DPDK could extend the number of sequential queues supported for > >host DPDK applications. > >Previous generation cards supported 128 TX queues that could be used with > >a host application, hence why this issue is not seen with them. > > > >This however would not fix the immediate issue and would be more of a > >long term solution. It could be flagged in the documentation as a known > >issue/corner case that is not supported in the mean time. > > > >Alternatively OVS could attempt to setup as many queues as possible on > >the DPDK device itself. If an error is detected the appropriate fields > >would have to be updated such as > >dev->real_n_txq. > >In this case we would setup 64 of the requested 73, log a warning message > >to the user. However there may be issues with how the pmd threads map to > >the correct tx queue IDs. > >I¹ve noticed that when netdev_dpdk_send__ is called the qid is 72, and > >this value comes from dp_execute_cb() where the tx_qid is taken from the > >dp_netdev_pmd_thread. > > > >Any Feedback would be appreciated. > > netdev-dpdk already supports working with a smaller of > txqs by using a spinlock. > > The ideal fix (IMHO) would be for DPDK to report the usable > number of transmission queues. In the meantime we can retry > with less transmission queues if the queue setup fails. > > I've implemented this workaroud in the second patch of the > series.
Thanks. I agree, ideally we could get this information from DPDK. DPDK team are aware of it and hopefully we can get a fix into R2.2 > > > > >Thanks > >Ian > > > > > > > > > > _______________________________________________ > discuss mailing list > [email protected] > http://openvswitch.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
