> On 22 Feb 2017, at 13:53, Simone Tiraboschi <[email protected]> wrote: > > > > On Wed, Feb 22, 2017 at 1:33 PM, Simone Tiraboschi <[email protected] > <mailto:[email protected]>> wrote: > When ovirt-ha-agent checks the status of the engine VM we get: > 2017-02-21 22:21:14,738-0500 ERROR (jsonrpc/2) [api] FINISH getStats > error=Virtual machine does not exist: {'vmId': > u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in > method > ret = func(*args, **kwargs) > File "/usr/share/vdsm/API.py", line 335, in getStats > vm = self.vm > File "/usr/share/vdsm/API.py", line 130, in vm > raise exception.NoSuchVM(vmId=self._UUID) > NoSuchVM: Virtual machine does not exist: {'vmId': > u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} > > While in ovirt-ha-agent logs we have: > MainThread::INFO::2017-02-21 > 22:21:18,583::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) > Current state UnknownLocalVmState (score: 3400) > ... > MainThread::INFO::2017-02-21 > 22:21:31,199::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) > Unknown local engine vm status no actions taken > Probably it's a bug or a regression somewhere on master. > > On ovirt-ha-broker side the detection is based on a strict string match on > the error message that is expected to be exactly 'Virtual machine does not > exist' to set down status otherwise we set unknown status as in this case: > https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54 > > <https://gerrit.ovirt.org/gitweb?p=ovirt-hosted-engine-ha.git;a=blob;f=ovirt_hosted_engine_ha/broker/submonitors/engine_health.py;h=d633cb860b811e84021221771bf706a9a4ac1d63;hb=refs/heads/master#l54> > > Adding Francesco here to understand if something has recently changed there > on vdsm side.
That’s not a very robust code handling. Yes, the text changed, the vm id was added. And yes, it may change again any time I guess > > > > On Wed, Feb 22, 2017 at 1:02 PM, Sandro Bonazzola <[email protected] > <mailto:[email protected]>> wrote: > Adding Lev > > On Wed, Feb 22, 2017 at 12:59 PM, Sahina Bose <[email protected] > <mailto:[email protected]>> wrote: > Hi all, > > On the HC setup, the HE VM is not restarted. > The agent.log has > MainThread::INFO::2017-02-21 > 22:09:58,022::state_machine::169::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) > Global metadata: {} > MainThread::INFO::2017-02-21 > 22:09:58,023::state_machine::177::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) > Local (id 1): {'engine-health': {'reason': 'failed to getVmStats', 'health': > 'unknown', 'vm': 'unknown', 'detail': 'unknown'}, 'bridge': True, 'mem-free': > 4079.0, 'maintenance': False, 'cpu-load': 0.0491, 'gateway': True} > ... > MainThread::INFO::2017-02-21 > 22:10:29,219::state_decorators::25::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) > Unknown local engine vm status no actions taken > MainThread::INFO::2017-02-21 > 22:10:29,219::brokerlink::111::ovirt_hosted_engine_ha.lib.br > <http://ovirt_hosted_engine_ha.lib.br/>okerlink.BrokerLink::(notify) Trying: > notify time=1487733029.22 type=state_transition > detail=ReinitializeFSM-UnknownLocalVmState > hostname='lago-hc-basic-suite-master-host0' > MainThread::INFO::2017-02-21 > 22:10:29,317::brokerlink::121::ovirt_hosted_engine_ha.lib.br > <http://ovirt_hosted_engine_ha.lib.br/>okerlink.BrokerLink::(notify) Success, > was notification of state_transition (ReinitializeFSM-UnknownLocalVmState) > sent? ignored > and the vdsm.log > > 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] > (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Changed state to Down: User > shut down from within the guest (code=7) (vm:1269) > 2017-02-21 22:09:11,962-0500 INFO (libvirt/events) [virt.vm] > (vmId='2ccc0ef0-cc31-45b8-8e91-a78fa4cad671') Stopping connection > (guestagent:429) > > 2017-02-21 22:09:29,727-0500 ERROR (jsonrpc/4) [api] FINISH getStats > error=Virtual machine does not exist: {'vmId': > u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} (api:69) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 67, in > method > ret = func(*args, **kwargs) > File "/usr/share/vdsm/API.py", line 335, in getStats > vm = self.vm > File "/usr/share/vdsm/API.py", line 130, in vm > raise exception.NoSuchVM(vmId=self._UUID) > NoSuchVM: Virtual machine does not exist: {'vmId': > u'2ccc0ef0-cc31-45b8-8e91-a78fa4cad671'} > > > What should I be looking for to identify the issue? > > The logs are at > http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild/artifact/exported-artifacts/test_logs/hc-basic-suite-master/post-002_bootstrap.py/lago-hc-basic-suite-master-host0 > > <http://jenkins.ovirt.org/job/ovirt_master_hc-system-tests/lastCompletedBuild/artifact/exported-artifacts/test_logs/hc-basic-suite-master/post-002_bootstrap.py/lago-hc-basic-suite-master-host0> > > thanks > sahina > > _______________________________________________ > Devel mailing list > [email protected] <mailto:[email protected]> > http://lists.ovirt.org/mailman/listinfo/devel > <http://lists.ovirt.org/mailman/listinfo/devel> > > > > -- > Sandro Bonazzola > Better technology. Faster innovation. Powered by community collaboration. > See how it works at redhat.com <http://redhat.com/> > > _______________________________________________ > Devel mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/devel
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
