Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

Kevin Traynor Wed, 13 Jan 2021 06:16:17 -0800

On 12/01/2021 18:20, Alex Yeh (ayeh) wrote:
> Hi Kevin, Stokes,
>       Resending just to make sure the email is not lost.
> Thanks and looking forward to your suggestion,
> Alex
>


+Cc vhost/virtio maintainers

Thanks for the report and checking the newer versions. I think at this
stage you should log a report in https://bugs.dpdk.org and provide steps
for the vhost/virtio maintainers so they can reproduce this issue.

> -----Original Message-----
> From: Alex Yeh (ayeh) 
> Sent: Friday, January 08, 2021 11:36 AM
> To: Kevin Traynor <ktray...@redhat.com>; Stokes, Ian <ian.sto...@intel.com>; 
> dev@dpdk.org
> Cc: Yegappan Lakshmanan (yega) <y...@cisco.com>
> Subject: RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest 
> VM restarts network service
> 
> Hi Kevin, Stokes,
>       Thanks for the suggestion.
>       We have upgrade to OVS 2.11.4 and DPDK 18.11.10. The OVS still crashes 
> with the same segfault error when application within the guest VM retarts. 
> Any suggestion on how to proceed?
> 
> Thanks
> Alex
> 
> [root@nfvis ~]# ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.11.4
> DPDK 18.11.10
> 
> -----Original Message-----
> From: Kevin Traynor <ktray...@redhat.com>
> Sent: Thursday, November 19, 2020 4:09 AM
> To: Stokes, Ian <ian.sto...@intel.com>; Alex Yeh (ayeh) <a...@cisco.com>; 
> dev@dpdk.org
> Cc: Yegappan Lakshmanan (yega) <y...@cisco.com>
> Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest 
> VM restarts network service
> 
> On 19/11/2020 11:21, Stokes, Ian wrote:
>>> Hi,
>>>                We are seeing a ovs-vswitchd service crash with 
>>> segfault in the librte_vhost library when a DPDK application within a guest 
>>> VM is stopped.
>>>
>>>                We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 
>>> Linux kernel) with DPDK 18.11.2.
>>
>> Hi,
>>
>> Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2?  These are 
>> quite old.
>>
>> As a first step I would recommend using the latest of these branches that 
>> have been validated with by the OVS community.
>>
>> As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is 
>> still present there my suspicion is that this could be an issue resolved in 
>> the DPDK library since 18.11.2.
>>
> 
> +1, there's 58 commits in the vhost library on 18.11 branch since
> 18.11.2, so it might be already fixed. 18.11.10 is the latest release, while 
> below is in from 18.11.7.
> 
> $ git log --oneline v18.11.2..HEAD . | grep crash 90b5ba739f vhost: fix crash 
> on port deletion
> 
> If you are planning to continue to use 18.11 for a while, I think you will 
> want to test the 18.11.11 Release Candidate that will be available in a few 
> weeks. It is the last planned 18.11 release, so any issues you find *after* 
> it is released won't be fixed.
> 
> Kevin.
> 
> 
> 
>> Regards
>> Ian
>>
>>>
>>>                We are using OVS-DPDK on the host and the guest VM is 
>>> running a DPDK application. With some traffic, if the application 
>>> service within the VM is restarted, then OVS crashes.
>>>
>>>                This crash is not seen if the guest VM is restarted 
>>> (instead of stopping the application within the VM).
>>>
>>>                The crash trackback (attached below) points to the
>>> rte_memcpy_generic() function in rte_memcpy.h. It looks like the 
>>> crash occurs when vhost is trying to dequeue the packets from the 
>>> guest VM (as the application in the guest VM has stopped and the huge 
>>> pages are returned to the guest kernel).
>>>
>>>                We have tried enabling iommu in ovs by setting 
>>> "other_config:vhost-iommu-support=true" and enabling iommu in qemu 
>>> using the following configuration in the guest domain XML:
>>> <iommu model='intel'>
>>>     <driver intremap='on'/>
>>> </iommu>
>>>                With iommu enabled ovs-vswitchd still crashes when 
>>> guest VM restarts the network service.
>>>
>>>                Is this a known problem? Anyone else seen a crash like 
>>> this?  How can we protect the ovs-vswitchd from crashing when a guest 
>>> VM restarts the network application or service?
>>>
>>> Thanks
>>> Alex
>>> ---------------------------------------------------------------------
>>> ---
>>>
>>> Log:
>>> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]:
>>> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 
>>> error 4 in librte_vhost.so.4[7f4d2ae52000+1a000]
>>> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main 
>>> process exited, code=killed, status=11/SEGV
>>>
>>> Environment:
>>> CentOs 7.6.1810
>>> openvswitch-2.11.1-1.el7.centos.x86_64
>>> openvswitch-kmod-2.11.1-1.el7.centos.x86_64
>>> dpdk-18.11-2.el7.centos.x86_64
>>> 3.10.0-1062.4.1.el7.x86_64
>>> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1
>>>
>>> Core dump trace:
>>> (gdb) bt
>>> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>, 
>>> src=0x7fffcef3607c, n=<optimized out>) at
>>> /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-
>>> gcc/include/rte_memcpy.h:793
>>> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0
>>>
>>> (gdb) list *0x00007ffff205602e
>>> 0x7ffff205602e is in rte_memcpy_generic
>>> (/usr/src/debug/dpdk-18.11/x86_64-
>>> native-linuxapp-gcc/include/rte_memcpy.h:793).
>>> 788 }
>>> 789
>>> 790 /**
>>> 791 * For copy with unaligned load
>>> 792 */
>>> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
>>> 794
>>> 795 /**
>>> 796 * Copy whatever left
>>> 797 */
>>>
>>> (gdb) list *0x00007ffff205c192
>>> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk- 
>>> 18.11/lib/librte_vhost/virtio_net.c:1192).
>>> 1187 * In zero copy mode, one mbuf can only reference data
>>> 1188 * for one or partial of one desc buff.
>>> 1189 */
>>> 1190 mbuf_avail = cpy_len;
>>> 1191 } else {
>>> 1192 if (likely(cpy_len > MAX_BATCH_LEN ||
>>> 1193 vq->batch_copy_nb_elems >= vq->size ||
>>> 1194 (hdr && cur == m))) {
>>> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
>>> 1196 mbuf_offset),
>>> (gdb)
>>>
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>

Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

Reply via email to