On Sun, Dec 18, 2016 at 5:26 PM, Nir Soffer <[email protected]> wrote:
> On Sun, Dec 18, 2016 at 4:17 PM, Barak Korren <[email protected]> wrote: > > After we've fixed the various system test issues that arose from the > > CentOS 7.3 release, we're now seeing a new failure that seems to has > > to do with snapshot merges. > > > > I'm guessing this may have to do with something the went in last week > > while we "weren't looking". > > > > Failing job can be seen here: > > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_master/4275 > > > > The test code snippet that is failing is as follows: > > > > 226 api.vms.get(VM0_NAME).snapshots.list()[-2].delete() > > 227 testlib.assert_true_within_short( > > 228 lambda: > > 229 (len(api.vms.get(VM0_NAME).snapshots.list()) == 2) and > > 230 (api.vms.get(VM0_NAME).snapshots.list()[-1].snapshot_ > status > > 231 == 'ok'), > > 232 ) > > > > The failure itself is a test timeout: > > > > ... > > File "/home/jenkins/workspace/test-repo_ovirt_experimental_ > master/ovirt-system-tests/basic-suite-master/test- > scenarios/004_basic_sanity.py", > > line 228, in snapshot_merge > > lambda: > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 248, in assert_true_within_short > > allowed_exceptions=allowed_exceptions, > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > > 240, in assert_true_within > > raise AssertionError('Timed out after %s seconds' % timeout) > > AssertionError: Timed out after 180 seconds > > > > Engine log generated during the test can be found here: > > > > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ > master/4275/artifact/exported-artifacts/basic_suite_master. > sh-el7/exported-artifacts/test_logs/basic-suite-master/ > post-004_basic_sanity.py/lago-basic-suite-master-engine/_ > var_log_ovirt-engine/engine.log > > Quickly looking in the engine log does not reveal anything useful. > > Understanding issues like this require vdsm log from the host > performing the operation. > > Looking in vdsm log: > > http://jenkins.ovirt.org/job/test-repo_ovirt_experimental_ > master/4275/artifact/exported-artifacts/basic_suite_master. > sh-el7/exported-artifacts/test_logs/basic-suite-master/ > post-004_basic_sanity.py/lago-basic-suite-master-host1/_var_ > log_vdsm/vdsm.log > > 2016-12-18 07:54:44,325 INFO (jsonrpc/1) [dispatcher] Run and > protect: getDeviceList(storageType=3, guids=(), checkStatus=False, > options={}) (logUtils:49) > > 2016-12-18 07:54:44,565 INFO (jsonrpc/1) [dispatcher] Run and > protect: getDeviceList, Return response: {'devList': [{'status': > 'unknown', 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev': > '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', > 'pathlist': [{'initiatorname': 'default', 'connection': > '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal': > '1', 'user': 'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus': > [{'type': 'iSCSI', 'physdev': 'sdd', 'capacity': '21474836480', > 'state': 'active', 'lun': '3'}], 'devtype': 'iSCSI', > 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun3_bdev_b94d3d26-d6e4-4bfc-9d91-4898731b721f', 'GUID': > '36001405b94d3d26d6e44bfc9d9148987', 'productID': 'lun3_bdev'}, > {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity': > '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', > 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection': > '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal': > '1', 'user': 'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus': > [{'type': 'iSCSI', 'physdev': 'sdc', 'capacity': '21474836480', > 'state': 'active', 'lun': '2'}], 'devtype': 'iSCSI', > 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun2_bdev_eddd3ce8-2ea8-4326-a9a8-ebc609a768db', 'GUID': > '36001405eddd3ce82ea84326a9a8ebc60', 'productID': 'lun2_bdev'}, > {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity': > '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', > 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection': > '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal': > '1', 'user': 'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus': > [{'type': 'iSCSI', 'physdev': 'sda', 'capacity': '21474836480', > 'state': 'active', 'lun': '0'}], 'devtype': 'iSCSI', > 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID': > '360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'}, > {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity': > '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', > 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection': > '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal': > '1', 'user': 'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus': > [{'type': 'iSCSI', 'physdev': 'sdb', 'capacity': '21474836480', > 'state': 'active', 'lun': '1'}], 'devtype': 'iSCSI', > 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun1_bdev_c581fee8-4da8-42c8-8414-305e003cd57c', 'GUID': > '36001405c581fee84da842c88414305e0', 'productID': 'lun1_bdev'}, > {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity': > '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', > 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection': > '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal': > '1', 'user': 'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus': > [{'type': 'iSCSI', 'physdev': 'sde', 'capacity': '21474836480', > 'state': 'active', 'lun': '4'}], 'devtype': 'iSCSI', > 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID': > '3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]} > (logUtils:52) > > 2016-12-18 07:54:44,567 INFO (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC > call Host.getDeviceList succeeded in 0.24 seconds (__init__:515) > > 2016-12-18 07:54:45,692 INFO (jsonrpc/2) [dispatcher] Run and > protect: getDeviceList(storageType=3, > guids=[u'3600140518c78046b1594d14a65e5f9ad', > u'360014052e4947abdee947b5b55c9e28a'], checkStatus=False, options={}) > (logUtils:49) > > 2016-12-18 07:54:45,882 WARN (jsonrpc/2) [storage.LVM] lvm pvs > failed: 5 [] [' WARNING: Not using lvmetad because config setting > use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to > make changes visible (pvscan --cache).', ' Failed to find physical > volume "/dev/mapper/3600140518c78046b1594d14a65e5f9ad".'] (lvm:324) > 2016-12-18 07:54:45,883 WARN (jsonrpc/2) [storage.HSM] getPV failed > for guid: 3600140518c78046b1594d14a65e5f9ad (hsm:1970) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList > pv = lvm.getPV(guid) > File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV > raise se.InaccessiblePhysDev((pvName,)) > InaccessiblePhysDev: Multipath cannot access physical device(s): > "devices=(u'3600140518c78046b1594d14a65e5f9ad',)" > 2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.LVM] lvm pvs > failed: 5 [] [' WARNING: Not using lvmetad because config setting > use_lvmetad=0.', ' WARNING: To avoid corruption, rescan devices to > make changes visible (pvscan --cache).', ' Failed to find physical > volume "/dev/mapper/360014052e4947abdee947b5b55c9e28a".'] (lvm:324) > 2016-12-18 07:54:45,944 WARN (jsonrpc/2) [storage.HSM] getPV failed > for guid: 360014052e4947abdee947b5b55c9e28a (hsm:1970) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/hsm.py", line 1967, in _getDeviceList > pv = lvm.getPV(guid) > File "/usr/share/vdsm/storage/lvm.py", line 856, in getPV > raise se.InaccessiblePhysDev((pvName,)) > InaccessiblePhysDev: Multipath cannot access physical device(s): > "devices=(u'360014052e4947abdee947b5b55c9e28a',)" > 2016-12-18 07:54:45,977 INFO (jsonrpc/2) [dispatcher] Run and > protect: getDeviceList, Return response: {'devList': [{'status': > 'unknown', 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev': > '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', > 'pathlist': [{'initiatorname': 'default', 'connection': > '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal': > '1', 'user': 'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus': > [{'type': 'iSCSI', 'physdev': 'sda', 'capacity': '21474836480', > 'state': 'active', 'lun': '0'}], 'devtype': 'iSCSI', > 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun0_bdev_2e4947ab-dee9-47b5-b55c-9e28a18d6831', 'GUID': > '360014052e4947abdee947b5b55c9e28a', 'productID': 'lun0_bdev'}, > {'status': 'unknown', 'vendorID': 'LIO-ORG', 'capacity': > '21474836480', 'fwrev': '4.0', 'discard_zeroes_data': 0, 'vgUUID': '', > 'pvsize': '', 'pathlist': [{'initiatorname': 'default', 'connection': > '192.168.200.3', 'iqn': 'iqn.2014-07.org.ovirt:storage', 'portal': > '1', 'user': 'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 0, 'pathstatus': > [{'type': 'iSCSI', 'physdev': 'sde', 'capacity': '21474836480', > 'state': 'active', 'lun': '4'}], 'devtype': 'iSCSI', > 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun4_bdev_18c78046-b159-4d14-a65e-5f9ad3b5b4f2', 'GUID': > '3600140518c78046b1594d14a65e5f9ad', 'productID': 'lun4_bdev'}]} > (logUtils:52) > 2016-12-18 07:54:45,978 INFO (jsonrpc/2) [jsonrpc.JsonRpcServer] RPC > call Host.getDeviceList succeeded in 0.29 seconds (__init__:515) > > Looks a pv is not accessible - storage issue? > > Please check that the pvs are clean before using them for the test. > Clearing the luns on the the storage before the test can be a good idea. > They should ALWAYS be clean, as we start from scratch (including creation of the devices on the storage) from scratch in every run. > > 2016-12-18 07:54:47,165 INFO (jsonrpc/4) [dispatcher] Run and > protect: createVG(vgname=u'097c3032-99da-4e18-b547-19ff340f846e', > devlist=[u'3600140518c78046b1594d14a65e5f9ad', > u'360014052e4947abdee947b5b55c9e28a'], force=False, options=None) > (logUtils:49) > > Engine is trying to create a vg with the pv > 360014052e4947abdee947b5b55c9e28a that > fail before in getDeviceList. > > This looks like a bug in engine, ignoring the results of getDeviceList. > > 2016-12-18 07:54:47,552 INFO (jsonrpc/4) [dispatcher] Run and > protect: createVG, Return response: {'uuid': > 'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW'} (logUtils:52) > 2016-12-18 07:54:47,552 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC > call LVMVolumeGroup.create succeeded in 0.38 seconds (__init__:515) > 2016-12-18 07:54:48,562 INFO (jsonrpc/5) [dispatcher] Run and > protect: createStorageDomain(storageType=3, > sdUUID=u'097c3032-99da-4e18-b547-19ff340f846e', domainName=u'iscsi', > typeSpecificArg=u'zJyjdF-AvTC-Z6wN-XNRO-1jDd-2SUH-dVQTJW', domClass=1, > domVersion=u'3', options=None) (logUtils:49) > 2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache] > looking for unfetched domain 097c3032-99da-4e18-b547-19ff340f846e > (sdc:151) > 2016-12-18 07:54:48,563 ERROR (jsonrpc/5) [storage.StorageDomainCache] > looking for domain 097c3032-99da-4e18-b547-19ff340f846e (sdc:168) > 2016-12-18 07:54:48,564 ERROR (jsonrpc/5) [storage.StorageDomainCache] > domain 097c3032-99da-4e18-b547-19ff340f846e not found (sdc:157) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/sdc.py", line 155, in _findDomain > dom = findMethod(sdUUID) > File "/usr/share/vdsm/storage/sdc.py", line 185, in _findUnfetchedDomain > raise se.StorageDomainDoesNotExist(sdUUID) > StorageDomainDoesNotExist: Storage domain does not exist: > (u'097c3032-99da-4e18-b547-19ff340f846e',) > > We a lot of these errors in the rest of the log. This meas something > is wrong with this vg. > > Needs deeper investigation from storage developer on both engine and vdsm > side, > but I would start by making sure we use clean luns. We are not trying > to test esoteric > negative flows in the system tests. > > Did we change something in the system tests project or lago while we > were not looking? > CentOS 7.3... ? Y. > > Can we reproduce this issue manually with same engine and vdsm versions? > > Nir > > > > > Please have a look. > > > > Thanks, > > Barak. > > > > > > -- > > Barak Korren > > [email protected] > > RHCE, RHCi, RHV-DevOps Team > > https://ifireball.wordpress.com/ > > _______________________________________________ > > Infra mailing list > > [email protected] > > http://lists.ovirt.org/mailman/listinfo/infra >
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
