On Tue, Jun 30, 2020 at 9:30 PM Nir Soffer <[email protected]> wrote: > > On Tue, Jun 30, 2020 at 10:37 AM Yedidyah Bar David <[email protected]> wrote: > > > > On Tue, Jun 30, 2020 at 9:37 AM Michal Skrivanek > > <[email protected]> wrote: > > > > > > > > > > > > > On 30 Jun 2020, at 08:30, Yedidyah Bar David <[email protected]> wrote: > > > > > > > > Hi all, > > > > > > > > I am trying to verify fixes for ovirt-engine-rename, specifically for > > > > OVN. Engine top patch is [1], OST patch [2]. Ran the manual job on > > > > these [3]. > > > > > > > > In previous patches, OST failed in earlier tests. Now, it passed these > > > > tests, so I hope that my patches are enough for what I am trying to > > > > do. However, [3] did fail later, during test_hotplug_cpu - it set the > > > > number of CPUs, then tried to connect to the VM, and timed out. > > > > > > > > The logs imply that right after it changed the number of CPUs, the VM > > > > was rebooted, apparently by libvirtd. Relevant log snippets: > > > > > > > > vdsm [4]: > > > > > > > > 2020-06-29 10:21:10,889-0400 DEBUG (jsonrpc/1) [virt.vm] > > > > (vmId='7474280d-4501-4355-9425-63898757682b') Setting number of cpus > > > > to : 2 (vm:3089) > > > > 2020-06-29 10:21:10,952-0400 INFO (jsonrpc/1) [api.virt] FINISH > > > > setNumberOfCpus return={'status': {'code': 0, 'message': 'Done'}, > > > > 'vmList': {}} from=::ffff:192.168.201.4,54576, flow_id=7f9503ed, > > > > vmId=7474280d-4501-4355-9425-63898757682b (api:54) > > > > 2020-06-29 10:21:11,111-0400 DEBUG (periodic/0) > > > > [virt.sampling.VMBulkstatsMonitor] sampled timestamp 2925.602824355 > > > > elapsed 0.160 acquired True domains all (sampling:451) > > > > 2020-06-29 10:21:11,430-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer] > > > > Return 'VM.setNumberOfCpus' in bridge with {} (__init__:356) > > > > 2020-06-29 10:21:11,432-0400 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] > > > > RPC call VM.setNumberOfCpus succeeded in 0.56 seconds (__init__:312) > > > > 2020-06-29 10:21:12,228-0400 INFO (libvirt/events) [virt.vm] > > > > (vmId='7474280d-4501-4355-9425-63898757682b') reboot event (vm:1033) > > > > > > > > qemu [5]: > > > > > > > > 2020-06-29T14:21:12.260303Z qemu-kvm: terminating on signal 15 from > > > > pid 42224 (<unknown process>) > > > > 2020-06-29 14:21:12.462+0000: shutting down, reason=destroyed > > > > > > > > libvirtd [6] itself does not log anything relevant AFAICT, but at > > > > least it shows that the above unknown process is itself: > > > > > > > > 2020-06-29 14:18:16.212+0000: 42224: error : qemuMonitorIO:620 : > > > > internal error: End of file from qemu monitor > > Is this from libvirt log?
Yes > Why would libvirt log log libvirtd pid? Perhaps because it forks several children, and thus has in its logging format string also the pid? That's standard practice. > > > > > (Note that above line is from 3 minutes before the reboot, and the > > > > only place in the log with '42224'. No other log there has 42224, > > > > other than these and audit.log). > > > > > > > > Any idea? Is this a bug in libvirt? vdsm? I'd at least expect > > > > something in the log for such a severe step. > > Is this the hosted engine vm? No, it's "vm0". (It's the basic suite, not hosted-engine). > If we had trouble with storage, maybe > sanlock killed the vm because it could not renew the lease. > > What do we have in /var/log/sanlock.log? https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/sanlock.log Nothing that seems relevant - last line is: 2020-06-29 10:14:29 2523 [44292]: s4:r6 resource bdabb997-2075-4bb6-8217-b3a99d1bd599:da464550-1f81-443a-907d-39b763f13751:/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases:3145728 > > Also, do we have any WARN logs in vdsm? if there was an issue with > storage you would see warnings about blocked checkers every 10 seconds. 20 WARN lines. Last one before the problem: 2020-06-29 10:18:33,569-0400 WARN (vm/da464550) [virt.vm] (vmId='da464550-1f81-443a-907d-39b763f13751') Cannot find device alias for _conf:{'type': 'lease', 'device': 'lease', 'lease_id': 'da464550-1f81-443a-907d-39b763f13751', 'sd_id': 'bdabb997-2075-4bb6-8217-b3a99d1bd599', 'path': '/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases', 'offset': '3145728'} _deviceXML:None alias:None config:<bound method Base.config of <lease.Device sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599, lease_id=da464550-1f81-443a-907d-39b763f13751, path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases, offset=3145728 at 0x7f18ec1c9d58>> createXmlElem:<bound method Device.createXmlElem of <lease.Device sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599, lease_id=da464550-1f81-443a-907d-39b763f13751, path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases, offset=3145728 at 0x7f18ec1c9d58>> custom:{} device:lease deviceId:None deviceType:None from_xml_tree:<bound method Device.from_xml_tree of <class 'vdsm.virt.vmdevices.lease.Device'>> getXML:<bound method Device.getXML of <lease.Device sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599, lease_id=da464550-1f81-443a-907d-39b763f13751, path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases, offset=3145728 at 0x7f18ec1c9d58>> get_extra_xmls:<bound method Base.get_extra_xmls of <lease.Device sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599, lease_id=da464550-1f81-443a-907d-39b763f13751, path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases, offset=3145728 at 0x7f18ec1c9d58>> get_identifying_attrs:<bound method Device.get_identifying_attrs of <class 'vdsm.virt.vmdevices.lease.Device'>> get_metadata:<bound method Base.get_metadata of <lease.Device sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599, lease_id=da464550-1f81-443a-907d-39b763f13751, path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases, offset=3145728 at 0x7f18ec1c9d58>> hotunplug_event:<threading.Event object at 0x7f18c24ce780> is_hostdevice:False lease_id:da464550-1f81-443a-907d-39b763f13751 log:<SimpleLogAdapter virt.vm (DEBUG)> offset:3145728 path:/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases sd_id:bdabb997-2075-4bb6-8217-b3a99d1bd599 setup:<bound method Base.setup of <lease.Device sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599, lease_id=da464550-1f81-443a-907d-39b763f13751, path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases, offset=3145728 at 0x7f18ec1c9d58>> specParams:{} teardown:<bound method Base.teardown of <lease.Device sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599, lease_id=da464550-1f81-443a-907d-39b763f13751, path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases, offset=3145728 at 0x7f18ec1c9d58>> type:lease update_device_info:<bound method Device.update_device_info of <class 'vdsm.virt.vmdevices.lease.Device'>> vmid:None (core:329) Next one after it: 2020-06-29 10:21:16,643-0400 WARN (libvirt/events) [root] Attempting to remove a non existing network: ovirtmgmt/7474280d-4501-4355-9425-63898757682b (libvirtnetwork:200) > > > > I’d suggest to rerun. > > > > ok It passed: https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7033/ Feel free to have a look and compare with the failed one. Thanks and best regards, > > > > > I don’t trust the CI env at all. Could be any reason. > > > > If the qemu process would have simply died, I'd agree it's likely to > > be an infra issue. > > But if libvirt indeed killed it, I'd say it's either a bug there, or > > some weird flow (which requires more logging, at least). > > > > > It’s highly unlikely to be caused by your patch, and I can see on my > > > infra that OST is running well on both CentOS and Stream. > > > > Well, also basic-suite is now passing for quite some time now without > > any failures: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/ > > > > Thanks, > > > > > > > > > > > > > [1] https://gerrit.ovirt.org/109961 > > > > [2] https://gerrit.ovirt.org/109734 > > > > [3] > > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/ > > > > [4] > > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log > > > > [5] > > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/libvirt/qemu/vm0.log > > > > [6] > > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/libvirt.log > > > > -- > > > > Didi > > > > _______________________________________________ > > > > Devel mailing list -- [email protected] > > > > To unsubscribe send an email to [email protected] > > > > Privacy Statement: https://www.ovirt.org/privacy-policy.html > > > > oVirt Code of Conduct: > > > > https://www.ovirt.org/community/about/community-guidelines/ > > > > List Archives: > > > > https://lists.ovirt.org/archives/list/[email protected]/message/JEF5QWFZF4O2OGQFHPH7SPU6SX76KF47/ > > > > > > > > > -- > > Didi > > _______________________________________________ > > Devel mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > Privacy Statement: https://www.ovirt.org/privacy-policy.html > > oVirt Code of Conduct: > > https://www.ovirt.org/community/about/community-guidelines/ > > List Archives: > > https://lists.ovirt.org/archives/list/[email protected]/message/HMOCTJZDQERXZEXF24IUQE653RSATGLF/ > -- Didi _______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/N6E7GQNR5OAKY3ZC2D7UQCLXBNCE7SOD/
