On Tue, Jun 30, 2020 at 9:30 PM Nir Soffer <[email protected]> wrote:
>
> On Tue, Jun 30, 2020 at 10:37 AM Yedidyah Bar David <[email protected]> wrote:
> >
> > On Tue, Jun 30, 2020 at 9:37 AM Michal Skrivanek
> > <[email protected]> wrote:
> > >
> > >
> > >
> > > > On 30 Jun 2020, at 08:30, Yedidyah Bar David <[email protected]> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I am trying to verify fixes for ovirt-engine-rename, specifically for
> > > > OVN. Engine top patch is [1], OST patch [2]. Ran the manual job on
> > > > these [3].
> > > >
> > > > In previous patches, OST failed in earlier tests. Now, it passed these
> > > > tests, so I hope that my patches are enough for what I am trying to
> > > > do. However, [3] did fail later, during test_hotplug_cpu - it set the
> > > > number of CPUs, then tried to connect to the VM, and timed out.
> > > >
> > > > The logs imply that right after it changed the number of CPUs, the VM
> > > > was rebooted, apparently by libvirtd. Relevant log snippets:
> > > >
> > > > vdsm [4]:
> > > >
> > > > 2020-06-29 10:21:10,889-0400 DEBUG (jsonrpc/1) [virt.vm]
> > > > (vmId='7474280d-4501-4355-9425-63898757682b') Setting number of cpus
> > > > to : 2 (vm:3089)
> > > > 2020-06-29 10:21:10,952-0400 INFO  (jsonrpc/1) [api.virt] FINISH
> > > > setNumberOfCpus return={'status': {'code': 0, 'message': 'Done'},
> > > > 'vmList': {}} from=::ffff:192.168.201.4,54576, flow_id=7f9503ed,
> > > > vmId=7474280d-4501-4355-9425-63898757682b (api:54)
> > > > 2020-06-29 10:21:11,111-0400 DEBUG (periodic/0)
> > > > [virt.sampling.VMBulkstatsMonitor] sampled timestamp 2925.602824355
> > > > elapsed 0.160 acquired True domains all (sampling:451)
> > > > 2020-06-29 10:21:11,430-0400 DEBUG (jsonrpc/1) [jsonrpc.JsonRpcServer]
> > > > Return 'VM.setNumberOfCpus' in bridge with {} (__init__:356)
> > > > 2020-06-29 10:21:11,432-0400 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer]
> > > > RPC call VM.setNumberOfCpus succeeded in 0.56 seconds (__init__:312)
> > > > 2020-06-29 10:21:12,228-0400 INFO  (libvirt/events) [virt.vm]
> > > > (vmId='7474280d-4501-4355-9425-63898757682b') reboot event (vm:1033)
> > > >
> > > > qemu [5]:
> > > >
> > > > 2020-06-29T14:21:12.260303Z qemu-kvm: terminating on signal 15 from
> > > > pid 42224 (<unknown process>)
> > > > 2020-06-29 14:21:12.462+0000: shutting down, reason=destroyed
> > > >
> > > > libvirtd [6] itself does not log anything relevant AFAICT, but at
> > > > least it shows that the above unknown process is itself:
> > > >
> > > > 2020-06-29 14:18:16.212+0000: 42224: error : qemuMonitorIO:620 :
> > > > internal error: End of file from qemu monitor
>
> Is this from libvirt log?

Yes

> Why would libvirt log log libvirtd pid?

Perhaps because it forks several children, and thus has in its logging
format string also the pid? That's standard practice.

>
> > > > (Note that above line is from 3 minutes before the reboot, and the
> > > > only place in the log with '42224'. No other log there has 42224,
> > > > other than these and audit.log).
> > > >
> > > > Any idea? Is this a bug in libvirt? vdsm? I'd at least expect
> > > > something in the log for such a severe step.
>
> Is this the hosted engine vm?

No, it's "vm0". (It's the basic suite, not hosted-engine).

> If we had trouble with storage, maybe
> sanlock killed the vm because it could not renew the lease.
>
> What do we have in /var/log/sanlock.log?

https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/sanlock.log

Nothing that seems relevant - last line is:

2020-06-29 10:14:29 2523 [44292]: s4:r6 resource
bdabb997-2075-4bb6-8217-b3a99d1bd599:da464550-1f81-443a-907d-39b763f13751:/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases:3145728

>
> Also, do we have any WARN logs in vdsm? if there was an issue with
> storage you would see warnings about blocked checkers every 10 seconds.

20 WARN lines. Last one before the problem:

2020-06-29 10:18:33,569-0400 WARN  (vm/da464550) [virt.vm]
(vmId='da464550-1f81-443a-907d-39b763f13751') Cannot find device alias
for _conf:{'type': 'lease', 'device': 'lease', 'lease_id':
'da464550-1f81-443a-907d-39b763f13751', 'sd_id':
'bdabb997-2075-4bb6-8217-b3a99d1bd599', 'path':
'/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases',
'offset': '3145728'} _deviceXML:None alias:None config:<bound method
Base.config of <lease.Device
sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599,
lease_id=da464550-1f81-443a-907d-39b763f13751,
path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases,
offset=3145728 at 0x7f18ec1c9d58>> createXmlElem:<bound method
Device.createXmlElem of <lease.Device
sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599,
lease_id=da464550-1f81-443a-907d-39b763f13751,
path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases,
offset=3145728 at 0x7f18ec1c9d58>> custom:{} device:lease
deviceId:None deviceType:None from_xml_tree:<bound method
Device.from_xml_tree of <class 'vdsm.virt.vmdevices.lease.Device'>>
getXML:<bound method Device.getXML of <lease.Device
sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599,
lease_id=da464550-1f81-443a-907d-39b763f13751,
path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases,
offset=3145728 at 0x7f18ec1c9d58>> get_extra_xmls:<bound method
Base.get_extra_xmls of <lease.Device
sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599,
lease_id=da464550-1f81-443a-907d-39b763f13751,
path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases,
offset=3145728 at 0x7f18ec1c9d58>> get_identifying_attrs:<bound method
Device.get_identifying_attrs of <class
'vdsm.virt.vmdevices.lease.Device'>> get_metadata:<bound method
Base.get_metadata of <lease.Device
sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599,
lease_id=da464550-1f81-443a-907d-39b763f13751,
path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases,
offset=3145728 at 0x7f18ec1c9d58>> hotunplug_event:<threading.Event
object at 0x7f18c24ce780> is_hostdevice:False
lease_id:da464550-1f81-443a-907d-39b763f13751 log:<SimpleLogAdapter
virt.vm (DEBUG)> offset:3145728
path:/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases
sd_id:bdabb997-2075-4bb6-8217-b3a99d1bd599 setup:<bound method
Base.setup of <lease.Device
sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599,
lease_id=da464550-1f81-443a-907d-39b763f13751,
path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases,
offset=3145728 at 0x7f18ec1c9d58>> specParams:{} teardown:<bound
method Base.teardown of <lease.Device
sd_id=bdabb997-2075-4bb6-8217-b3a99d1bd599,
lease_id=da464550-1f81-443a-907d-39b763f13751,
path=/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/bdabb997-2075-4bb6-8217-b3a99d1bd599/dom_md/xleases,
offset=3145728 at 0x7f18ec1c9d58>> type:lease
update_device_info:<bound method Device.update_device_info of <class
'vdsm.virt.vmdevices.lease.Device'>> vmid:None (core:329)

Next one after it:

2020-06-29 10:21:16,643-0400 WARN  (libvirt/events) [root] Attempting
to remove a non existing network:
ovirtmgmt/7474280d-4501-4355-9425-63898757682b (libvirtnetwork:200)

>
> > > I’d suggest to rerun.
> >
> > ok

It passed:

https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7033/

Feel free to have a look and compare with the failed one.

Thanks and best regards,

> >
> > > I don’t trust the CI env at all. Could be any reason.
> >
> > If the qemu process would have simply died, I'd agree it's likely to
> > be an infra issue.
> > But if libvirt indeed killed it, I'd say it's either a bug there, or
> > some weird flow (which requires more logging, at least).
> >
> > > It’s highly unlikely to be caused by your patch, and I can see on my 
> > > infra that OST is running well on both CentOS and Stream.
> >
> > Well, also basic-suite is now passing for quite some time now without
> > any failures:
> >
> > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/
> >
> > Thanks,
> >
> > >
> > > >
> > > > [1] https://gerrit.ovirt.org/109961
> > > > [2] https://gerrit.ovirt.org/109734
> > > > [3] 
> > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/
> > > > [4] 
> > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
> > > > [5] 
> > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/libvirt/qemu/vm0.log
> > > > [6] 
> > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7031/artifact/exported-artifacts/test_logs/basic-suite-master/post-004_basic_sanity_pytest.py/lago-basic-suite-master-host-0/_var_log/libvirt.log
> > > > --
> > > > Didi
> > > > _______________________________________________
> > > > Devel mailing list -- [email protected]
> > > > To unsubscribe send an email to [email protected]
> > > > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > > > oVirt Code of Conduct: 
> > > > https://www.ovirt.org/community/about/community-guidelines/
> > > > List Archives: 
> > > > https://lists.ovirt.org/archives/list/[email protected]/message/JEF5QWFZF4O2OGQFHPH7SPU6SX76KF47/
> > >
> >
> >
> > --
> > Didi
> > _______________________________________________
> > Devel mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct: 
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives: 
> > https://lists.ovirt.org/archives/list/[email protected]/message/HMOCTJZDQERXZEXF24IUQE653RSATGLF/
>


-- 
Didi
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/N6E7GQNR5OAKY3ZC2D7UQCLXBNCE7SOD/

Reply via email to