On Mon, Aug 9, 2021 at 4:01 PM Nir Soffer <nsof...@redhat.com> wrote: > > On Mon, Aug 9, 2021 at 2:42 PM Yedidyah Bar David <d...@redhat.com> wrote: > > > > On Mon, Aug 9, 2021 at 1:43 PM Nir Soffer <nsof...@redhat.com> wrote: > > > > > > On Mon, Aug 9, 2021 at 10:35 AM Yedidyah Bar David <d...@redhat.com> > > > wrote: > > > > > > > > On Sun, Aug 8, 2021 at 5:42 PM Code Review <ger...@ovirt.org> wrote: > > > > > > > > > > From Jenkins CI <jenk...@ovirt.org>: > > > > > > > > > > Jenkins CI has posted comments on this change. ( > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 ) > > > > > > > > > > Change subject: HE: Use node image > > > > > ...................................................................... > > > > > > > > > > > > > > > Patch Set 13: Continuous-Integration-1 > > > > > > > > > > Build Failed > > > > > > > > While trying to deactivate a host, the engine wanted to migrate a VM > > > > (vm0) from host-0 to host-1. vdsm log of host-0 says: > > > > > > > > 2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm] > > > > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate > > > > (migration:503) > > > > Traceback (most recent call last): > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 477, in _regular_run > > > > time.time(), machineParams > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 578, in _startUnderlyingMigration > > > > self._perform_with_conv_schedule(duri, muri) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 667, in _perform_with_conv_schedule > > > > self._perform_migration(duri, muri) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 596, in _perform_migration > > > > self._migration_flags) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line > > > > 159, in call > > > > return getattr(self._vm._dom, name)(*a, **kw) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line > > > > 101, in f > > > > ret = attr(*args, **kwargs) > > > > File > > > > "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", > > > > line 131, in wrapper > > > > ret = f(*args, **kwargs) > > > > File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", > > > > line 94, in wrapper > > > > return func(inst, *args, **kwargs) > > > > File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2126, in > > > > migrateToURI3 > > > > raise libvirtError('virDomainMigrateToURI3() failed') > > > > libvirt.libvirtError: Unsafe migration: Migration without shared > > > > storage is unsafe > > > > > > Please share the vm xml: > > > > > > sudo virsh -r dumpxl vm-name > > > > I think you should be able to find a dump of it in vdsm.log: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/18650/artifact/check-patch.he-basic_suite_master.el8.x86_64/test_logs/ost-he-basic-suite-master-host-0/var/log/vdsm/vdsm.log > > > > I think the first line of starting a migration is: > > > > 2021-08-08 14:31:08,350+0000 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer] > > Calling 'VM.migrate' in bridge with {'vmID': > > 'cde311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': > > > > A few lines later: > > > > 2021-08-08 14:31:08,387+0000 DEBUG (migsrc/cde311f9) > > [virt.metadata.Descriptor] dumped metadata for > > cde311f9-9a33-4eb9-8338-fa22ff49edc2: <?xml version='1.0' > > encoding='utf-8'?> > > <vm> > > <balloonTarget type="int">98304</balloonTarget> > > This is not the vm xml but the metadata xml.
OK > > Looking at the logs on both hosts: > > [nsoffer@sparse ost]$ head -1 *vdsm.log > ==> host0-vdsm.log <== > 2021-08-08 13:16:04,676+0000 INFO (MainThread) [vds] (PID: 65169) I > am the actual vdsm 4.40.80.3.12.git6d67b935b > ost-he-basic-suite-master-host-0 (4.18.0-326.el8.x86_64) (vdsmd:162) > > ==> host1-vdsm.log <== > 2021-08-08 15:40:54,367+0200 INFO (MainThread) [vds] (PID: 23005) I > am the actual vdsm 4.40.80.4.5.git4309a3949 > ost-he-basic-suite-master-host-1 (4.18.0-326.el8.x86_64) (vdsmd:162) > > - The hosts clocks are using different time zones (+0000 vs +0200) is > this intended? > - You are testing different versions of vdsm - is this intended? Both of these are a result of the patch this was ran for - which makes host-0 use ovirt-node, for the he-basic suite. > > We have about 60 errors: > $ grep 'Migration without shared storage is unsafe' host0-vdsm.log | wc -l > 60 > > Looking at the first migration that failed, the vm xml is here: > > 2021-08-08 14:20:34,127+0000 INFO (vm/cde311f9) [virt.vm] > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') <?xml version='1.0' e > ncoding='utf-8'?> > <domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0" > xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm"> > <name>vm0</name> > <uuid>cde311f9-9a33-4eb9-8338-fa22ff49edc2</uuid> > ... > > The relevant parts for storage are: > > <disk device="disk" snapshot="no" type="file"> > <driver cache="none" error_policy="stop" io="threads" > iothread="1" name="qemu" type="qcow2" /> > <source > file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87"> > <seclabel model="dac" relabel="no" /> > </source> > <target bus="virtio" dev="vda" /> > <serial>20002ad2-4a97-4d2f-b3fc-c103477b5b91</serial> > <boot order="1" /> > <alias name="ua-20002ad2-4a97-4d2f-b3fc-c103477b5b91" /> > <address bus="0x05" domain="0x0000" function="0x0" > slot="0x00" type="pci" /> > </disk> > <disk device="disk" snapshot="no" type="block"> > <driver cache="none" error_policy="stop" io="native" > name="qemu" type="raw" /> > <source dev="/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f"> > <seclabel model="dac" relabel="no" /> > </source> > <target bus="scsi" dev="sda" /> > <serial>738c8486-8929-44ec-9083-69327bde9c65</serial> > <alias name="ua-738c8486-8929-44ec-9083-69327bde9c65" /> > <address bus="0" controller="0" target="0" type="drive" unit="0" > /> > </disk> > > So we have one qcow2 disk on file storage, and one direct lun. > > On the destination, the first migration attempt is here: > > 2021-08-08 16:31:08,437+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] > Calling 'VM.migrationCreate' in bridge with {'vmID': 'c > de311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': {'_srcDomXML': > '<domain type=\'kvm\' id=\'6\' xmlns:qemu=\'http://libvirt.org/ > ... > > We prepare the qcow2 disk: > > 2021-08-08 16:31:09,313+0200 INFO (vm/cde311f9) [vdsm.api] FINISH > prepareImage return={'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87', > 'info': {'type': 'file', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'}, > 'imgVolumesInfo': [{'domainID': > '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': > '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': > '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba', > 'leasePath': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba.lease', > 'leaseOffset': 0}, {'domainID': > '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': > '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': > '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87', > 'leasePath': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease', > 'leaseOffset': 0}, {'domainID': > '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': > '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': > 'a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb', > 'leasePath': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb.lease', > 'leaseOffset': 0}]} from=internal, > task_id=f2dfbacb-154c-4f5b-b57d-affcbf419691 (api:54) > 2021-08-08 16:31:09,314+0200 INFO (vm/cde311f9) [vds] prepared volume > path: > /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 > (clientIF:518) > > And the direct lun: > > 2021-08-08 16:31:09,315+0200 INFO (vm/cde311f9) [vdsm.api] START > appropriateDevice(guid='36001405bc9d94e4419b4b80a2f702e2f', > thiefId='cde311f9-9a33-4eb9-8338-fa22ff49edc2', deviceType='mpath') > from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:48) > 2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vdsm.api] FINISH > appropriateDevice return={'truesize': '21474836480', 'apparentsize': > '21474836480', 'path': > '/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f'} from=internal, > task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:54) > 2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vds] prepared volume > path: /dev/mapper/36001405bc9d94e4419b4b80a2f702e2f (clientIF:518) > > The interesting thing is that the qcow2 disk is using different path > on the source > and destination vms: > > source: > /rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 > > destination: > /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 > > On the source we have: > /rhev/data-center/pool-id/domain-id/images/image-id/volume-id > > On the destination we have: > /rhev/data-center/mnt/mountpoint/domain-id/images/image-id/volume-id > > Both lead to the same disk, but libvirt probably compared the strings > and decided that > we don't have shared storage. > > It may be new validation in libvirt, or maybe engine changed the way the disk > path is added to the xml recently? And/or maybe this is related to one of the hosts being ovirt-node? Some time ago Michal pushed this, to do the same on other suites: https://gerrit.ovirt.org/c/ovirt-system-tests/+/115913 and it did pass CI. Artifacts are gone by now, but it does test also migrations. Now rebased it, let's see how it goes. > > In vdsm we use os.path.realpath() when we compare disk paths since we know > that we can have different paths to the same volume. > > This kind of issue is likely to be reproducible without ost. > > Nir > Thanks, -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/OR2EMVMSAMOFR3VWYVOZWANTIHWZCIXC/