On Sun, Dec 18, 2016 at 5:26 PM, Nir Soffer <[email protected]> wrote:

> On Sun, Dec 18, 2016 at 4:17 PM, Barak Korren <[email protected]> wrote:
> > After we've fixed the various system test issues that arose from the
> > CentOS 7.3 release, we're now seeing a new failure that seems to has
> > to do with snapshot merges.
> >
> > I'm guessing this may have to do with something the went in last week
> > while we "weren't looking".
> >
> > Failing job can be seen here:
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4275
> >
> > The test code snippet that is failing is as follows:
> >
> >     226     api.vms.get(VM0_NAME).snapshots.list()[-2].delete()
> >     227     testlib.assert_true_within_short(
> >     228         lambda:
> >     229         (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and
> >     230         (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_
> status
> >     231          == 'ok'),
> >     232     )
> >
> > The failure itself is a test timeout:
> >
> >   ...
> >   File "/home/jenkins/workspace/test-repo_ovirt_experimental_
> master/ovirt-system-tests/basic-suite-master/test-
> scenarios/004_basic_sanity.py",
> > line 228, in snapshot_merge
> >     lambda:
> >   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
> > 248, in assert_true_within_short
> >     allowed_exceptions=allowed_exceptions,
> >   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line
> > 240, in assert_true_within
> >     raise AssertionError('Timed out after %s seconds' % timeout)
> > AssertionError: Timed out after 180 seconds
> >
> > Engine log generated during the test can be found here:
> >
> > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/4275/artifact/exported-artifacts/basic_suite_master.
> sh-el7/exported-artifacts/test_logs/basic-suite-master/
> post-004_basic_sanity.py/lago-basic-suite-master-engine/_
> var_log_ovirt-engine/engine.log
>
> Quickly looking in the engine log does not reveal anything useful.
>
> Understanding issues like this require vdsm log from the host
> performing the operation.
>
> Looking in vdsm log:
>
> http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_
> master/4275/artifact/exported-artifacts/basic_suite_master.
> sh-el7/exported-artifacts/test_logs/basic-suite-master/
> post-004_basic_sanity.py/lago-basic-suite-master-host1/_var_
> log_vdsm/vdsm.log
>
> 2016-12-18 07:54:44,325 INFO  (jsonrpc/1) [dispatcher] Run and
> protect: getDeviceList(storageType=3, guids=(), checkStatus=False,
> options={}) (logUtils:49)
>
> 2016-12-18 07:54:44,565 INFO  (jsonrpc/1) [dispatcher] Run and
> protect: getDeviceList, Return response: {'devList': [{'status':
> 'unknown', 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev':
> '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '',
> 'pathlist': [{'initiatorname': 'default', 'connection':
> '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
> '1', 'user': 'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
> [{'type': 'iSCSI', 'physdev': 'sdd', 'capacity': '21474836480',
> 'state': 'active', 'lun': '3'}], 'devtype': 'iSCSI',
> 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun3_bdev_b94d3d26-d6e4-4bfc-9d91-4898731b721f', 'GUID':
> '36001405b94d3d26d6e44bfc9d9148987', 'productID': 'lun3_bdev'},
> {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
> '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
> 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
> '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
> '1', 'user': 'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
> [{'type': 'iSCSI', 'physdev': 'sdc', 'capacity': '21474836480',
> 'state': 'active', 'lun': '2'}], 'devtype': 'iSCSI',
> 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun2_bdev_eddd3ce8-2ea8-4326-a9a8-ebc609a768db', 'GUID':
> '36001405eddd3ce82ea84326a9a8ebc60', 'productID': 'lun2_bdev'},
> {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
> '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
> 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
> '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
> '1', 'user': 'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
> [{'type': 'iSCSI', 'physdev': 'sda', 'capacity': '21474836480',
> 'state': 'active', 'lun': '0'}], 'devtype': 'iSCSI',
> 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
> '360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'},
> {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
> '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
> 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
> '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
> '1', 'user': 'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
> [{'type': 'iSCSI', 'physdev': 'sdb', 'capacity': '21474836480',
> 'state': 'active', 'lun': '1'}], 'devtype': 'iSCSI',
> 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun1_bdev_c581fee8-4da8-42c8-8414-305e003cd57c', 'GUID':
> '36001405c581fee84da842c88414305e0', 'productID': 'lun1_bdev'},
> {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
> '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
> 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
> '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
> '1', 'user': 'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
> [{'type': 'iSCSI', 'physdev': 'sde', 'capacity': '21474836480',
> 'state': 'active', 'lun': '4'}], 'devtype': 'iSCSI',
> 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
> '3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]}
> (logUtils:52)
>
> 2016-12-18 07:54:44,567 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
> call Host.getDeviceList succeeded in 0.24 seconds (__init__:515)
>
> 2016-12-18 07:54:45,692 INFO  (jsonrpc/2) [dispatcher] Run and
> protect: getDeviceList(storageType=3,
> guids=[u'3600140518c78046b1594d14a65e5f9ad',
> u'360014052e4947abdee947b5b55c9e28a'], checkStatus=False, options={})
> (logUtils:49)
>
> 2016-12-18 07:54:45,882 WARN  (jsonrpc/2) [storage.LVM] lvm pvs
> failed: 5 [] ['  WARNING: Not using lvmetad because config setting
> use_lvmetad=0.', '  WARNING: To avoid corruption, rescan devices to
> make changes visible (pvscan --cache).', '  Failed to find physical
> volume "/dev/mapper/3600140518c78046b1594d14a65e5f9ad".'] (lvm:324)
> 2016-12-18 07:54:45,883 WARN  (jsonrpc/2) [storage.HSM] getPV failed
> for guid: 3600140518c78046b1594d14a65e5f9ad (hsm:1970)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
>     pv = lvm.getPV(guid)
>   File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
>     raise se.InaccessiblePhysDev((pvName,))
> InaccessiblePhysDev: Multipath cannot access physical device(s):
> "devices=(u'3600140518c78046b1594d14a65e5f9ad',)"
> 2016-12-18 07:54:45,944 WARN  (jsonrpc/2) [storage.LVM] lvm pvs
> failed: 5 [] ['  WARNING: Not using lvmetad because config setting
> use_lvmetad=0.', '  WARNING: To avoid corruption, rescan devices to
> make changes visible (pvscan --cache).', '  Failed to find physical
> volume "/dev/mapper/360014052e4947abdee947b5b55c9e28a".'] (lvm:324)
> 2016-12-18 07:54:45,944 WARN  (jsonrpc/2) [storage.HSM] getPV failed
> for guid: 360014052e4947abdee947b5b55c9e28a (hsm:1970)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList
>     pv = lvm.getPV(guid)
>   File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV
>     raise se.InaccessiblePhysDev((pvName,))
> InaccessiblePhysDev: Multipath cannot access physical device(s):
> "devices=(u'360014052e4947abdee947b5b55c9e28a',)"
> 2016-12-18 07:54:45,977 INFO  (jsonrpc/2) [dispatcher] Run and
> protect: getDeviceList, Return response: {'devList': [{'status':
> 'unknown', 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev':
> '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '',
> 'pathlist': [{'initiatorname': 'default', 'connection':
> '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
> '1', 'user': 'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
> [{'type': 'iSCSI', 'physdev': 'sda', 'capacity': '21474836480',
> 'state': 'active', 'lun': '0'}], 'devtype': 'iSCSI',
> 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID':
> '360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'},
> {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity':
> '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '',
> 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection':
> '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal':
> '1', 'user': 'username', 'password': '********', 'port': '3260'}],
> 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus':
> [{'type': 'iSCSI', 'physdev': 'sde', 'capacity': '21474836480',
> 'state': 'active', 'lun': '4'}], 'devtype': 'iSCSI',
> 'physicalblocksize': '512', 'pvUUID': '', 'serial':
> 'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID':
> '3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]}
> (logUtils:52)
> 2016-12-18 07:54:45,978 INFO  (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC
> call Host.getDeviceList succeeded in 0.29 seconds (__init__:515)
>
> Looks a pv is not accessible - storage issue?
>
> Please check that the pvs are clean before using them for the test.
> Clearing the luns on the the storage before the test can be a good idea.
>

They should ALWAYS be clean, as we start from scratch (including creation
of the devices on the storage) from scratch in every run.


>
> 2016-12-18 07:54:47,165 INFO  (jsonrpc/4) [dispatcher] Run and
> protect: createVG(vgname=u'097c3032-99da-4e18-b547-19ff340f846e',
> devlist=[u'3600140518c78046b1594d14a65e5f9ad',
> u'360014052e4947abdee947b5b55c9e28a'], force=False, options=None)
> (logUtils:49)
>
> Engine is trying to create a vg with the pv
> 360014052e4947abdee947b5b55c9e28a that
> fail before in getDeviceList.
>
> This looks like a bug in engine, ignoring the results of getDeviceList.
>
> 2016-12-18 07:54:47,552 INFO  (jsonrpc/4) [dispatcher] Run and
> protect: createVG, Return response: {'uuid':
> 'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW'} (logUtils:52)
> 2016-12-18 07:54:47,552 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC
> call LVMVolumeGroup.create succeeded in 0.38 seconds (__init__:515)
> 2016-12-18 07:54:48,562 INFO  (jsonrpc/5) [dispatcher] Run and
> protect: createStorageDomain(storageType=3,
> sdUUID=u'097c3032-99da-4e18-b547-19ff340f846e', domainName=u'iscsi',
> typeSpecificArg=u'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW', domClass=1,
> domVersion=u'3', options=None) (logUtils:49)
> 2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
> looking for unfetched domain 097c3032-99da-4e18-b547-19ff340f846e
> (sdc:151)
> 2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache]
> looking for domain 097c3032-99da-4e18-b547-19ff340f846e (sdc:168)
> 2016-12-18 07:54:48,564 ERROR (jsonrpc/5) [storage.StorageDomainCache]
> domain 097c3032-99da-4e18-b547-19ff340f846e not found (sdc:157)
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain
>     dom = findMethod(sdUUID)
>   File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain
>     raise se.StorageDomainDoesNotExist(sdUUID)
> StorageDomainDoesNotExist: Storage domain does not exist:
> (u'097c3032-99da-4e18-b547-19ff340f846e',)
>
> We a lot of these errors in the rest of the log. This meas something
> is wrong with this vg.
>
> Needs deeper investigation from storage developer on both engine and vdsm
> side,
> but I would start by making sure we use clean luns. We are not trying
> to test esoteric
> negative flows in the system tests.
>
> Did we change something in the system tests project or lago while we
> were not looking?
>

CentOS 7.3... ?
Y.


>
> Can we reproduce this issue manually with same engine and vdsm versions?
>
> Nir
>
> >
> > Please have a look.
> >
> > Thanks,
> > Barak.
> >
> >
> > --
> > Barak Korren
> > [email protected]
> > RHCE, RHCi, RHV-DevOps Team
> > https://ifireball.wordpress.com/
> > _______________________________________________
> > Infra mailing list
> > [email protected]
> > http://lists.ovirt.org/mailman/listinfo/infra
>
_______________________________________________
Devel mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/devel

Reply via email to