On Mon, Aug 9, 2021 at 4:01 PM Nir Soffer <nsof...@redhat.com> wrote:
>
> On Mon, Aug 9, 2021 at 2:42 PM Yedidyah Bar David <d...@redhat.com> wrote:
> >
> > On Mon, Aug 9, 2021 at 1:43 PM Nir Soffer <nsof...@redhat.com> wrote:
> > >
> > > On Mon, Aug 9, 2021 at 10:35 AM Yedidyah Bar David <d...@redhat.com> 
> > > wrote:
> > > >
> > > > On Sun, Aug 8, 2021 at 5:42 PM Code Review <ger...@ovirt.org> wrote:
> > > > >
> > > > > From Jenkins CI <jenk...@ovirt.org>:
> > > > >
> > > > > Jenkins CI has posted comments on this change. ( 
> > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 )
> > > > >
> > > > > Change subject: HE: Use node image
> > > > > ......................................................................
> > > > >
> > > > >
> > > > > Patch Set 13: Continuous-Integration-1
> > > > >
> > > > > Build Failed
> > > >
> > > > While trying to deactivate a host, the engine wanted to migrate a VM
> > > > (vm0) from host-0 to host-1. vdsm log of host-0 says:
> > > >
> > > > 2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm]
> > > > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate
> > > > (migration:503)
> > > > Traceback (most recent call last):
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 477, in _regular_run
> > > >     time.time(), machineParams
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 578, in _startUnderlyingMigration
> > > >     self._perform_with_conv_schedule(duri, muri)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 667, in _perform_with_conv_schedule
> > > >     self._perform_migration(duri, muri)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 596, in _perform_migration
> > > >     self._migration_flags)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line
> > > > 159, in call
> > > >     return getattr(self._vm._dom, name)(*a, **kw)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 
> > > > 101, in f
> > > >     ret = attr(*args, **kwargs)
> > > >   File 
> > > > "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py",
> > > > line 131, in wrapper
> > > >     ret = f(*args, **kwargs)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/common/function.py",
> > > > line 94, in wrapper
> > > >     return func(inst, *args, **kwargs)
> > > >   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2126, in
> > > > migrateToURI3
> > > >     raise libvirtError('virDomainMigrateToURI3() failed')
> > > > libvirt.libvirtError: Unsafe migration: Migration without shared
> > > > storage is unsafe
> > >
> > > Please share the vm xml:
> > >
> > >     sudo virsh -r dumpxl vm-name
> >
> > I think you should be able to find a dump of it in vdsm.log:
> >
> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/18650/artifact/check-patch.he-basic_suite_master.el8.x86_64/test_logs/ost-he-basic-suite-master-host-0/var/log/vdsm/vdsm.log
> >
> > I think the first line of starting a migration is:
> >
> > 2021-08-08 14:31:08,350+0000 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer]
> > Calling 'VM.migrate' in bridge with {'vmID':
> > 'cde311f9-9a33-4eb9-8338-fa22ff49edc2', 'params':
> >
> > A few lines later:
> >
> > 2021-08-08 14:31:08,387+0000 DEBUG (migsrc/cde311f9)
> > [virt.metadata.Descriptor] dumped metadata for
> > cde311f9-9a33-4eb9-8338-fa22ff49edc2: <?xml version='1.0'
> > encoding='utf-8'?>
> > <vm>
> >     <balloonTarget type="int">98304</balloonTarget>
>
> This is not the vm xml but the metadata xml.

OK

>
> Looking at the logs on both hosts:
>
> [nsoffer@sparse ost]$ head -1 *vdsm.log
> ==> host0-vdsm.log <==
> 2021-08-08 13:16:04,676+0000 INFO  (MainThread) [vds] (PID: 65169) I
> am the actual vdsm 4.40.80.3.12.git6d67b935b
> ost-he-basic-suite-master-host-0 (4.18.0-326.el8.x86_64) (vdsmd:162)
>
> ==> host1-vdsm.log <==
> 2021-08-08 15:40:54,367+0200 INFO  (MainThread) [vds] (PID: 23005) I
> am the actual vdsm 4.40.80.4.5.git4309a3949
> ost-he-basic-suite-master-host-1 (4.18.0-326.el8.x86_64) (vdsmd:162)
>
> - The hosts clocks are using different time zones (+0000 vs +0200) is
> this intended?
> - You are testing different versions of vdsm - is this intended?

Both of these are a result of the patch this was ran for - which makes
host-0 use ovirt-node, for the he-basic suite.

>
> We have about 60 errors:
> $ grep 'Migration without shared storage is unsafe' host0-vdsm.log | wc -l
> 60
>
> Looking at the first migration that failed, the vm xml is here:
>
> 2021-08-08 14:20:34,127+0000 INFO  (vm/cde311f9) [virt.vm]
> (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') <?xml version='1.0' e
> ncoding='utf-8'?>
> <domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0";
> xmlns:ovirt-vm="http://ovirt.org/vm/1.0"; type="kvm">
>     <name>vm0</name>
>     <uuid>cde311f9-9a33-4eb9-8338-fa22ff49edc2</uuid>
> ...
>
> The relevant parts for storage are:
>
>         <disk device="disk" snapshot="no" type="file">
>             <driver cache="none" error_policy="stop" io="threads"
> iothread="1" name="qemu" type="qcow2" />
>             <source
> file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87">
>                 <seclabel model="dac" relabel="no" />
>             </source>
>             <target bus="virtio" dev="vda" />
>             <serial>20002ad2-4a97-4d2f-b3fc-c103477b5b91</serial>
>             <boot order="1" />
>             <alias name="ua-20002ad2-4a97-4d2f-b3fc-c103477b5b91" />
>             <address bus="0x05" domain="0x0000" function="0x0"
> slot="0x00" type="pci" />
>         </disk>
>         <disk device="disk" snapshot="no" type="block">
>             <driver cache="none" error_policy="stop" io="native"
> name="qemu" type="raw" />
>             <source dev="/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f">
>                 <seclabel model="dac" relabel="no" />
>             </source>
>             <target bus="scsi" dev="sda" />
>             <serial>738c8486-8929-44ec-9083-69327bde9c65</serial>
>             <alias name="ua-738c8486-8929-44ec-9083-69327bde9c65" />
>             <address bus="0" controller="0" target="0" type="drive" unit="0" 
> />
>         </disk>
>
> So we have one qcow2 disk on file storage, and one direct lun.
>
> On the destination, the first migration attempt is here:
>
> 2021-08-08 16:31:08,437+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
> Calling 'VM.migrationCreate' in bridge with {'vmID': 'c
> de311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': {'_srcDomXML':
> '<domain type=\'kvm\' id=\'6\' xmlns:qemu=\'http://libvirt.org/
> ...
>
> We prepare the qcow2 disk:
>
> 2021-08-08 16:31:09,313+0200 INFO  (vm/cde311f9) [vdsm.api] FINISH
> prepareImage return={'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
> 'info': {'type': 'file', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'},
> 'imgVolumesInfo': [{'domainID':
> '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
> '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
> '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba',
> 'leasePath': 
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba.lease',
> 'leaseOffset': 0}, {'domainID':
> '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
> '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
> '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
> 'leasePath': 
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
> 'leaseOffset': 0}, {'domainID':
> '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
> '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
> 'a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb',
> 'leasePath': 
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb.lease',
> 'leaseOffset': 0}]} from=internal,
> task_id=f2dfbacb-154c-4f5b-b57d-affcbf419691 (api:54)
> 2021-08-08 16:31:09,314+0200 INFO  (vm/cde311f9) [vds] prepared volume
> path: 
> /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
> (clientIF:518)
>
> And the direct lun:
>
> 2021-08-08 16:31:09,315+0200 INFO  (vm/cde311f9) [vdsm.api] START
> appropriateDevice(guid='36001405bc9d94e4419b4b80a2f702e2f',
> thiefId='cde311f9-9a33-4eb9-8338-fa22ff49edc2', deviceType='mpath')
> from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:48)
> 2021-08-08 16:31:09,550+0200 INFO  (vm/cde311f9) [vdsm.api] FINISH
> appropriateDevice return={'truesize': '21474836480', 'apparentsize':
> '21474836480', 'path':
> '/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f'} from=internal,
> task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:54)
> 2021-08-08 16:31:09,550+0200 INFO  (vm/cde311f9) [vds] prepared volume
> path: /dev/mapper/36001405bc9d94e4419b4b80a2f702e2f (clientIF:518)
>
> The interesting thing is that the qcow2 disk is  using different path
> on the source
> and destination vms:
>
> source:
> /rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
>
> destination:
> /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
>
> On the source we have:
> /rhev/data-center/pool-id/domain-id/images/image-id/volume-id
>
> On the destination we have:
> /rhev/data-center/mnt/mountpoint/domain-id/images/image-id/volume-id
>
> Both lead to the same disk, but libvirt probably compared the strings
> and decided that
> we don't have shared storage.
>
> It may be new validation in libvirt, or maybe engine changed the way the disk
> path is added to the xml recently?

And/or maybe this is related to one of the hosts being ovirt-node?

Some time ago Michal pushed this, to do the same on other suites:

https://gerrit.ovirt.org/c/ovirt-system-tests/+/115913

and it did pass CI. Artifacts are gone by now, but it does test also migrations.
Now rebased it, let's see how it goes.

>
> In vdsm we use os.path.realpath() when we compare disk paths since we know
> that we can have different paths to the same volume.
>
> This kind of issue is likely to be reproducible without ost.
>
> Nir
>

Thanks,
-- 
Didi
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/OR2EMVMSAMOFR3VWYVOZWANTIHWZCIXC/

Reply via email to