On Mon, Dec 11, 2017 at 6:44 PM, Milan Zamazal <[email protected]> wrote:
> Michal Skrivanek <[email protected]> writes: > > > Milan, > > > > log [1], VM b3962e5c-08e3-444e-910e-ea675fa1a5c7 > > migration away finished at 2017-12-05 06:26:24,515-0500 > > incoming migration of the same VM back, at 2017-12-05 06:26:46,614-0500 > > > > seems to me the migration away didn’t really properly clean up the VM. > Milane, > > can you check that if logs matches? Perhaps orphaned libvirt’s xml? > > It seems that, for unknown reasons, Engine didn’t call Destroy on the > source after the first migration. So while the VM was no longer running > there, it was still present in Vdsm (in Down status) and thus the > following create call was rejected in API. > > There are weird messages in engine.log during the first migration such > as the following ones, I don’t know whether they are harmless or not: > > 2017-12-05 06:26:19,799-05 INFO [org.ovirt.engine.core. > vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-89) > [] VM 'b3962e5c-08e3-444e-910e-ea675fa1a5c7'(vm0) was unexpectedly > detected as 'MigratingTo' on VDS '56bf5bdd-8a95-4b7a-9c72- > 7206d6b59e38'(lago-basic-suite-master-host-1) (expected on > '104bd4da-24bb-4368-a5ca-21a465aca70e') > 2017-12-05 06:26:19,799-05 INFO [org.ovirt.engine.core. > vdsbroker.monitoring.VmAnalyzer] > (EE-ManagedThreadFactory-engineScheduled-Thread-89) > [] VM 'b3962e5c-08e3-444e-910e-ea675fa1a5c7' is migrating to VDS > '56bf5bdd-8a95-4b7a-9c72-7206d6b59e38'(lago-basic-suite-master-host-1) > ignoring it in the refresh until migration is done > These logs are completely harmless - they only indicate that the VM's run_on_vds still points to the source host and the hand-over was not done yet. > > The migration completion event was successfully obtained by Engine: > > 2017-12-05 06:26:24,542-05 INFO [org.ovirt.engine.core.dal. > dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-2) [] > EVENT_ID: VM_MIGRATION_DONE(63), Migration completed (VM: vm0, Source: > lago-basic-suite-master-host-0, Destination: lago-basic-suite-master-host-1, > Duration: 6 seconds, Total: 6 seconds, Actual downtime: 73ms) > > > Otherwise it would need reproduction in CI and some more logs…. > > > > [1] > > http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4278/artifact/exported-artifacts/basic-suit-master- > el7/test_logs/basic-suite-master/post-006_migrations.py/ > lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log > > <http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4278/artifact/exported-artifacts/basic-suit-master- > el7/test_logs/basic-suite-master/post-006_migrations.py/ > lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log> > > > >> On 5 Dec 2017, at 13:26, Dafna Ron <[email protected]> wrote: > >> > >> Hi, > >> We had a failure for test 006_migrations.migrate_vm on master. > >> > >> There was a libvirt disruption in the migration src and after that > vdsm reported the migration as failed because the vm already exists which > makes me suspect a split bran case. > >> The patch reported has nothing to do with this issue and I think we > simply stumbled on a race condition which can cause a split brain. > >> Please note that there are several metrics related issues reported in > vdsm logs as well. > >> > >> Link and headline of suspected patches: > >> > >> Not related > >> > >> Link to Job: > >> > >> > >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/ < > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4278/> > >> > >> Link to all logs: > >> > >> > >> http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4278/artifact/exported-artifacts/basic-suit-master- > el7/test_logs/basic-suite-master/post-006_migrations.py/ < > http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4278/artifact/exported-artifacts/basic-suit-master- > el7/test_logs/basic-suite-master/post-006_migrations.py/> > >> > >> (Relevant) error snippet from the log: > >> <error> > >> > >> Engine: > >> 2017-12-05 06:26:48,546-05 ERROR [org.ovirt.engine.core.dal. > dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-385) > [] EVENT_ID: VM_MIGRATION_TO_SERVER_FAILED(120), Migration failed (VM: > vm0, Source: lago- > >> basic-suite-master-host-1, Destination: lago-basic-suite-master-host- > 0). > >> > >> > >> dst: > >> > >> 2017-12-05 06:26:46,615-0500 WARN (jsonrpc/6) [vds] vm > b3962e5c-08e3-444e-910e-ea675fa1a5c7 already exists (API:179) > >> 2017-12-05 06:26:46,615-0500 ERROR (jsonrpc/6) [api] FINISH create > error=Virtual machine already exists (api:124) > >> Traceback (most recent call last): > >> File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line > 117, in method > >> ret = func(*args, **kwargs) > >> File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 180, in > create > >> raise exception.VMExists() > >> VMExists: Virtual machine already exists > >> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH create > return={'status': {'message': 'Virtual machine already exists', 'code': 4}} > from=::ffff:192.168.201.3,50394 (api:52) > >> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [api.virt] FINISH > migrationCreate return={'status': {'message': 'Virtual machine already > exists', 'code': 4}} from=::ffff:192.168.201.3,50394 (api:52) > >> 2017-12-05 06:26:46,620-0500 INFO (jsonrpc/6) [jsonrpc.JsonRpcServer] > RPC call VM.migrationCreate failed (error 4) in 0.03 seconds (__init__:573) > >> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [api.virt] START > destroy(gracefulAttempts=1) from=::ffff:192.168.201.3,50394 (api:46) > >> 2017-12-05 06:26:46,624-0500 INFO (jsonrpc/2) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources > (vm:4967) > >> 2017-12-05 06:26:46,625-0500 WARN (jsonrpc/2) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') trying to set state to > Powering down when already Down (vm:575) > >> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection > (guestagent:435) > >> 2017-12-05 06:26:46,625-0500 INFO (jsonrpc/2) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') _destroyVmGraceful attempt > #0 (vm:5004) > >> 2017-12-05 06:26:46,626-0500 WARN (jsonrpc/2) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') VM > 'b3962e5c-08e3-444e-910e-ea675fa1a5c7' > couldn't be destroyed in libvirt: Requested operation is not valid: domain > is not running (vm > >> :5025) > >> 2017-12-05 06:26:46,627-0500 INFO (jsonrpc/2) [vdsm.api] START > teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70', > spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261', > imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None) from > >> =::ffff:192.168.201.3,50394, task_id=2da93725-5533-4354-a369-751eb44f9ef2 > (api:46) > >> > >> scr > >> > >> 2017-12-05 06:26:46,623-0500 ERROR (migsrc/b3962e5c) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') migration destination > error: Virtual machine already exists (migration:290) > >> > >> disruption on dst: > >> > >> 2017-12-05 06:20:04,662-0500 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] > RPC call VM.shutdown succeeded in 0.00 seconds (__init__:573) > >> 2017-12-05 06:20:04,676-0500 ERROR (Thread-1) [root] Shutdown by QEMU > Guest Agent failed (vm:5097) > >> Traceback (most recent call last): > >> File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 5088, > in qemuGuestAgentShutdown > >> self._dom.shutdownFlags(libvirt.VIR_DOMAIN_SHUTDOWN_GUEST_AGENT) > >> File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line > 98, in f > >> ret = attr(*args, **kwargs) > >> File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", > line 126, in wrapper > >> ret = f(*args, **kwargs) > >> File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 512, in > wrapper > >> return func(inst, *args, **kwargs) > >> File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2403, in > shutdownFlags > >> if ret == -1: raise libvirtError ('virDomainShutdownFlags() > failed', dom=self) > >> libvirtError: Guest agent is not responding: QEMU guest agent is not > connected > >> 2017-12-05 06:20:04,697-0500 INFO (libvirt/events) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') block threshold 1 exceeded > on 'vda' (/rhev/data-center/mnt/blockSD/952bb427-b88c-4fbe- > 99ef-49970d3aaf70/images/e6eadbae-ec7a-48f4- > >> 8832-64a622a12bef/1f063c73-f5cd-44e8-a84f-7810857f82df) > (drivemonitor:162) > >> 2017-12-05 06:20:04,698-0500 INFO (libvirt/events) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') drive 'vda' threshold > exceeded (storage:872) > >> 2017-12-05 06:20:05,889-0500 INFO (jsonrpc/7) [api.host] START > getAllVmStats() from=::1,41118 (api:46) > >> 2017-12-05 06:20:05,891-0500 INFO (jsonrpc/7) [api.host] FINISH > getAllVmStats return={'status': {'message': 'Done', 'code': 0}, > 'statsList': (suppressed)} from=::1,41118 (api:52) > >> 2017-12-05 06:20:05,892-0500 INFO (jsonrpc/7) [jsonrpc.JsonRpcServer] > RPC call Host.getAllVmStats succeeded in 0.00 seconds (__init__:573) > >> 2017-12-05 06:20:07,466-0500 INFO (libvirt/events) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') underlying process > disconnected (vm:1024) > >> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Release VM resources > (vm:4967) > >> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [virt.vm] > (vmId='b3962e5c-08e3-444e-910e-ea675fa1a5c7') Stopping connection > (guestagent:435) > >> 2017-12-05 06:20:07,469-0500 INFO (libvirt/events) [vdsm.api] START > teardownImage(sdUUID='952bb427-b88c-4fbe-99ef-49970d3aaf70', > spUUID='9dcfeaaf-96b7-4e26-a327-5570e0e39261', > imgUUID='e6eadbae-ec7a-48f4-8832-64a622a12bef', volUUID=None) > >> from=internal, task_id=9efedf46-d3be-4e41-b7f9-a074ed6344f6 (api:46) > >> > >> > >> </error> > >> > >> > >> > >> > >> _______________________________________________ > >> Devel mailing list > >> [email protected] > >> http://lists.ovirt.org/mailman/listinfo/devel > _______________________________________________ > Devel mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/devel >
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
