On 12/01/2021 18:20, Alex Yeh (ayeh) wrote: > Hi Kevin, Stokes, > Resending just to make sure the email is not lost. > Thanks and looking forward to your suggestion, > Alex >
+Cc vhost/virtio maintainers Thanks for the report and checking the newer versions. I think at this stage you should log a report in https://bugs.dpdk.org and provide steps for the vhost/virtio maintainers so they can reproduce this issue. > -----Original Message----- > From: Alex Yeh (ayeh) > Sent: Friday, January 08, 2021 11:36 AM > To: Kevin Traynor <ktray...@redhat.com>; Stokes, Ian <ian.sto...@intel.com>; > dev@dpdk.org > Cc: Yegappan Lakshmanan (yega) <y...@cisco.com> > Subject: RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest > VM restarts network service > > Hi Kevin, Stokes, > Thanks for the suggestion. > We have upgrade to OVS 2.11.4 and DPDK 18.11.10. The OVS still crashes > with the same segfault error when application within the guest VM retarts. > Any suggestion on how to proceed? > > Thanks > Alex > > [root@nfvis ~]# ovs-vswitchd --version > ovs-vswitchd (Open vSwitch) 2.11.4 > DPDK 18.11.10 > > -----Original Message----- > From: Kevin Traynor <ktray...@redhat.com> > Sent: Thursday, November 19, 2020 4:09 AM > To: Stokes, Ian <ian.sto...@intel.com>; Alex Yeh (ayeh) <a...@cisco.com>; > dev@dpdk.org > Cc: Yegappan Lakshmanan (yega) <y...@cisco.com> > Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest > VM restarts network service > > On 19/11/2020 11:21, Stokes, Ian wrote: >>> Hi, >>> We are seeing a ovs-vswitchd service crash with >>> segfault in the librte_vhost library when a DPDK application within a guest >>> VM is stopped. >>> >>> We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 >>> Linux kernel) with DPDK 18.11.2. >> >> Hi, >> >> Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2? These are >> quite old. >> >> As a first step I would recommend using the latest of these branches that >> have been validated with by the OVS community. >> >> As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is >> still present there my suspicion is that this could be an issue resolved in >> the DPDK library since 18.11.2. >> > > +1, there's 58 commits in the vhost library on 18.11 branch since > 18.11.2, so it might be already fixed. 18.11.10 is the latest release, while > below is in from 18.11.7. > > $ git log --oneline v18.11.2..HEAD . | grep crash 90b5ba739f vhost: fix crash > on port deletion > > If you are planning to continue to use 18.11 for a while, I think you will > want to test the 18.11.11 Release Candidate that will be available in a few > weeks. It is the last planned 18.11 release, so any issues you find *after* > it is released won't be fixed. > > Kevin. > > > >> Regards >> Ian >> >>> >>> We are using OVS-DPDK on the host and the guest VM is >>> running a DPDK application. With some traffic, if the application >>> service within the VM is restarted, then OVS crashes. >>> >>> This crash is not seen if the guest VM is restarted >>> (instead of stopping the application within the VM). >>> >>> The crash trackback (attached below) points to the >>> rte_memcpy_generic() function in rte_memcpy.h. It looks like the >>> crash occurs when vhost is trying to dequeue the packets from the >>> guest VM (as the application in the guest VM has stopped and the huge >>> pages are returned to the guest kernel). >>> >>> We have tried enabling iommu in ovs by setting >>> "other_config:vhost-iommu-support=true" and enabling iommu in qemu >>> using the following configuration in the guest domain XML: >>> <iommu model='intel'> >>> <driver intremap='on'/> >>> </iommu> >>> With iommu enabled ovs-vswitchd still crashes when >>> guest VM restarts the network service. >>> >>> Is this a known problem? Anyone else seen a crash like >>> this? How can we protect the ovs-vswitchd from crashing when a guest >>> VM restarts the network application or service? >>> >>> Thanks >>> Alex >>> --------------------------------------------------------------------- >>> --- >>> >>> Log: >>> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]: >>> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 >>> error 4 in librte_vhost.so.4[7f4d2ae52000+1a000] >>> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main >>> process exited, code=killed, status=11/SEGV >>> >>> Environment: >>> CentOs 7.6.1810 >>> openvswitch-2.11.1-1.el7.centos.x86_64 >>> openvswitch-kmod-2.11.1-1.el7.centos.x86_64 >>> dpdk-18.11-2.el7.centos.x86_64 >>> 3.10.0-1062.4.1.el7.x86_64 >>> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1 >>> >>> Core dump trace: >>> (gdb) bt >>> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>, >>> src=0x7fffcef3607c, n=<optimized out>) at >>> /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp- >>> gcc/include/rte_memcpy.h:793 >>> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0 >>> >>> (gdb) list *0x00007ffff205602e >>> 0x7ffff205602e is in rte_memcpy_generic >>> (/usr/src/debug/dpdk-18.11/x86_64- >>> native-linuxapp-gcc/include/rte_memcpy.h:793). >>> 788 } >>> 789 >>> 790 /** >>> 791 * For copy with unaligned load >>> 792 */ >>> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs); >>> 794 >>> 795 /** >>> 796 * Copy whatever left >>> 797 */ >>> >>> (gdb) list *0x00007ffff205c192 >>> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk- >>> 18.11/lib/librte_vhost/virtio_net.c:1192). >>> 1187 * In zero copy mode, one mbuf can only reference data >>> 1188 * for one or partial of one desc buff. >>> 1189 */ >>> 1190 mbuf_avail = cpy_len; >>> 1191 } else { >>> 1192 if (likely(cpy_len > MAX_BATCH_LEN || >>> 1193 vq->batch_copy_nb_elems >= vq->size || >>> 1194 (hdr && cur == m))) { >>> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, >>> 1196 mbuf_offset), >>> (gdb) >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >