2018-01-18 17:49 GMT+02:00 Michal Privoznik <mpriv...@redhat.com>: > On 01/18/2018 08:25 AM, Ján Tomko wrote: > > On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote: > >> Hello libvirt-users list, > >> > >> We're catching the same bug since 3.4.0 version (3.3.0 works OK). > >> So, we have process that is permanently connected to libvirtd via socket > >> and it is collecting stats, listening to events and control the VPSes. > >> > >> When we try to 'shutdown' a number of VPSes we often catch the bug. > >> One of > >> VPSes sticks in 'in shutdown' state, no related 'qemu' process is > >> present, > >> and there is the next error in the log: > >> > >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: > >> 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status > >> data > >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: > >> 20441: error : virFileReadAll:1420 : Failed to open file > >> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\ > x2d36\x2dDOMAIN1.scope/cpuacct.usage': > >> > >> No such file or directory > >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: > >> 20441: error : virCgroupGetValueStr:844 : Unable to read from > >> '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\ > x2d36\x2dDOMAIN1.scope/cpuacct.usage': > >> > >> No such file or directory > >> Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: > >> 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu > >> account: Operation not permitted > >> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: > >> 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job > >> (destroy, none) for domain DOMAIN1; current job is (query, none) owned > by > >> (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) > >> Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: > >> 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during > >> operation: cannot acquire state change lock (held by > >> remoteDispatchConnectGetAllDomainStats) > >> > >> I think only the last line matters. > >> The bug is highly reproducible. We can easily catch it even when we call > >> multiple 'virsh shutdown' in shell one by one. > >> > >> When we shutdown the process connected to the socket - everything > >> become OK > >> and the bug is gone. > >> > >> The system is used is Gentoo Linux, tried all modern versions of libvirt > >> (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from > >> git)) > >> and they have this bug. 3.3.0 works OK. > >> > > > > I don't see anything obvious stats related in the diff between 3.3.0 and > > 3.4.0. We have added reporting of the shutdown reason, but that's just > > parsing one more JSON reply we previously ignored. > > > > Can you try running 'git bisect' to pinpoint the exact commit that > > caused this issue? > > I am able to reproduce this issue, ran bisect and fount that the commit > which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1. > > https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h= > aeda1b8c56dc58b0a413acc61bbea938b40499e1;hp=ec337aee9b20091d6f9f60b78f210d > 55f812500b > > But it's very unlikely that the commit is causing the error. If anything > it is just exposing whatever error we have there. I mean, if I revert > the commit on the top of current HEAD I can no longer reproduce the issue. > > Michal >
Michal, Ján, I've got the same results: Bisecting: 0 revisions left to test after this (roughly 0 steps) [aeda1b8c56dc58b0a413acc61bbea938b40499e1] qemu: monitor: do not report error on shutdown And yes, when I revert it in HEAD - the problem is gone.
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users