Hi, DPDK dev indicated the crash was resolved in 16.04, and indeed when runningwith dpdk 16.04 and latest ovs from git, and removing "mrg_rxbuf=off" from qemu's "-device virtio-net-pci", the crash is no longer observed. However, we are wittnessing ovs gets stuck:2016-05-02T17:26:18.804Z|00111|ovs_rcu|WARN|blocked 1000 ms waiting for pmd145 to quiesce 2016-05-02T17:26:19.805Z|00112|ovs_rcu|WARN|blocked 2001 ms waiting for pmd145 to quiesce 2016-05-02T17:26:21.804Z|00113|ovs_rcu|WARN|blocked 4000 ms waiting for pmd145 to quiesce 2016-05-02T17:26:25.805Z|00114|ovs_rcu|WARN|blocked 8001 ms waiting for pmd145 to quiesce 2016-05-02T17:26:33.805Z|00115|ovs_rcu|WARN|blocked 16001 ms waiting for pmd145 to quiesce 2016-05-02T17:26:49.805Z|00116|ovs_rcu|WARN|blocked 32001 ms waiting for pmd145 to quiesce 2016-05-02T17:27:14.354Z|00072|ovs_rcu(vhost_thread2)|WARN|blocked 128000 ms waiting for pmd145 to quiesce 2016-05-02T17:27:15.841Z|00008|ovs_rcu(urcu3)|WARN|blocked 128001 ms waiting for pmd145 to quiesce 2016-05-02T17:27:21.805Z|00117|ovs_rcu|WARN|blocked 64000 ms waiting for pmd145 to quiesce 2016-05-02T17:28:25.804Z|00118|ovs_rcu|WARN|blocked 128000 ms waiting for pmd145 to quiesce We observed the same ovs-stuck behavior in ovs 2.5.0 release (with mrg_rxbuf=off), and we though this issue was resolved in latest ovs 2.5.0 branch, but when removing the "mrg_rxbuf=off" we started observing ovs stuck again. Are you familiar with this issue? Thanks
On Tuesday, 5 April 2016 10:26 AM, "Loftus, Ciara" <ciara.lof...@intel.com> wrote: Hi, Since the segmentation fault is occurring in the DPDK vhost code, it might be a good idea to post this information to the d...@dpdk.org mailing list where you might be able to get more feedback on the root cause. Thanks, Ciara > > • What you did that make the problem appear. > We have an openstack kilo setup. it has 3 controllers and 3 computes. 1 of > the controllers runs an ODL, which manages the OVS on each compute host. > The compute hosts are running an hlinux OS, which is HPE's Debian8-based > OS. > each host has 2 numa nodes, each with 12 cores (24 Hyper Threaded). each > numa with 64GB. > We patched neutron to create vhostuser ports (which is not available in > stable kilo), in order to work with dpdk in order to achieve highest > throughput possible. > OVS was running with "-c 4" and pmd-core-mask 0x38. all these cores were > isolated. > nova was configured with vcpu_pin_set=6-11, and the flavor had 6 vCPUs. > flavor had 16 1GB huge pages, backed up by real 1GB huge pages in host. > Then running a traffic generator inside 2 VMs, using DPDK, in order to > generate traffic. sending directly to the other VMs mac and IP. > • What you expected to happen. > We expected traffic to flow. > • What actually happened. > OVS crashed (in dpdk code). Attached BT. > • The Open vSwitch version number (as output by ovs-vswitchd --version) > root@BASE-CCP-CPN-N0001-NETCLM:~# ovs-vswitchd --version > ovs-vswitchd (Open vSwitch) 2.5.0 > Compiled Apr 4 2016 08:51:09 > • Any local patches or changes you have applied (if any). > applied ce179f1163f947fe8dc5afa35a2cdd0756bb53a0 > The following are also handy sometimes: > • The kernel version on which Open vSwitch is running (from /proc/version) > and the distribution and version number of your OS (e.g. "Centos 5.0"). > root@BASE-CCP-CPN-N0001-NETCLM:~# cat /proc/version > Linux version 3.14.48-1-amd64-hlinux (pbuilder@build) (gcc version 4.9.2 > (Debian 4.9.2-10) ) #hlinux1 SMP Thu Aug 6 16:02:22 UTC 2015 > • If you have Open vSwitch configured to connect to an OpenFlow controller, > the output of ovs-ofctl show <bridge> for each <bridge> configured in the > vswitchd configuration database. > We are using odl. attached outputs. > • A fix or workaround, if you have one > We disabled mrg_rxbuf (mrg_rxbuf=off) in qemu > > We can supply more info if necessary, like our exact build process etc.
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss