On Mon, Nov 20, 2017 at 3:10 PM, Dafna Ron <[email protected]> wrote: > Hi, > > We had a failure in OST for test 002_bootstrap.verify_add_all_hosts. > > From the logs I can see that vdsm on host0 was reporting that it cannot > find the physical volume but eventually the storage was created and is > reported as responsive. > > However, Host1 is reported to became non-operational with storage domain > does not exist error and I think that there is a race. >
I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1514906 on this. > I think that we create the storage domain while host1 is being installed > and if the domain is not created and reported as activated in time, host1 > will become nonOperational. > And based on the above description, this is exactly the issue I've described in the BZ. Y. > are we starting installation of host1 before host0 and storage are active? > > *Link to suspected patches: I do not think that the patch reported is > related to the error* > > > * https://gerrit.ovirt.org/#/c/84133/ > <https://gerrit.ovirt.org/#/c/84133/> Link to Job: * > > > > * http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/ > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/> Link > to all logs: * > > > *http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/ > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/> > * > > > *(Relevant) error snippet from the log: * > > > > * <error> Lago log: * > > 2017-11-18 11:15:25,472::log_utils.py::end_log_task::670::nose::INFO:: # > add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09) > 2017-11-18 11:15:25,472::log_utils.py::start_log_task::655::nose::INFO:: > # add_secondary_storage_domains: ESC[0mESC[0m > 2017-11-18 11:16:47,455::log_utils.py::end_log_task::670::nose::INFO:: # > add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21) > 2017-11-18 11:16:47,456::log_utils.py::start_log_task::655::nose::INFO:: > # import_templates: ESC[0mESC[0m > 2017-11-18 11:16:47,513::testlib.py::stopTest::198::nose::INFO:: * > SKIPPED: Exported domain generation not supported yet > 2017-11-18 11:16:47,514::log_utils.py::end_log_task::670::nose::INFO:: # > import_templates: ESC[32mSuccessESC[0m (in 0:00:00) > 2017-11-18 11:16:47,514::log_utils.py::start_log_task::655::nose::INFO:: > # verify_add_all_hosts: ESC[0mESC[0m > 2017-11-18 > 11:16:47,719::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR:: > * Unhandled exception in <function <lambda> at 0x2909230> > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 219, > in assert_equals_within > res = func() > File "/home/jenkins/workspace/ovirt-master_change-queue- > tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py", > line 430, in <lambda> > lambda: _all_hosts_up(hosts_service, total_hosts) > File "/home/jenkins/workspace/ovirt-master_change-queue- > tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py", > line 129, in _all_hosts_up > _check_problematic_hosts(hosts_service) > File "/home/jenkins/workspace/ovirt-master_change-queue- > tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py", > line 149, in _check_problematic_hosts > raise RuntimeError(dump_hosts) > RuntimeError: 1 hosts failed installation: > lago-basic-suite-master-host-1: non_operational > > 2017-11-18 11:16:47,722::utils.py::wrapper::480::lago.utils::DEBUG::Looking > for a workdir > 2017-11-18 > 11:16:47,722::workdir.py::resolve_workdir_path::361::lago.workdir::DEBUG::Checking > if /dev/shm/ost/deployment-basic-suite-master is a workdir > 2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO:: > * Collect artifacts: ESC[0mESC[0m > 2017-11-18 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO:: > * Collect artifacts: ESC[0mESC[0m > > vdsm host0: > > 2017-11-18 06:14:23,980-0500 INFO (jsonrpc/0) [vdsm.api] START > getDeviceList(storageType=3, guids=[u'360014059618895272774e97a2aaf5dd6'], > checkStatus=False, options={}) from=::ffff:192.168.201.4,45636, > flow_id=ed8310a1-a7af-4a67-b351-8ff > 364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46) > 2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.LVM] lvm pvs > failed: 5 [] [' Failed to find physical volume "/dev/mapper/ > 360014059618895272774e97a2aaf5dd6".'] (lvm:322) > 2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.HSM] getPV failed > for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 1970, > in _getDeviceList > pv = lvm.getPV(guid) > File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", line 852, > in getPV > raise se.InaccessiblePhysDev((pvName,)) > InaccessiblePhysDev: Multipath cannot access physical device(s): > "devices=(u'360014059618895272774e97a2aaf5dd6',)" > 2017-11-18 06:14:24,389-0500 INFO (jsonrpc/0) [vdsm.api] FINISH > getDeviceList return={'devList': [{'status': 'unknown', 'vendorID': > 'LIO-ORG', 'capacity': '21474836480', 'fwrev': '4.0', > 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', 'pathlist': > [{'initiatorname': u'default', 'connection': u'192.168.200.4', 'iqn': > u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user': u'username', > 'password': '********', 'port': '3260'}, {'initiatorname': u'default', > 'connection': u'192.168.201.4', 'iqn': u'iqn.2014-07.org.ovirt:storage', > 'portal': '1', 'user': u'username', 'password': '********', 'port': > '3260'}], 'logicalblocksize': '512', 'discard_max_bytes': 1073741824, > 'pathstatus': [{'type': 'iSCSI', 'physdev': 'sda', 'capacity': > '21474836480', 'state': 'active', 'lun': '0'}, {'type': 'iSCSI', 'physdev': > 'sdf', 'capacity': '21474836480', 'state': 'active', 'lun': '0'}], > 'devtype': 'iSCSI', 'physicalblocksize': '512', 'pvUUID': '', 'serial': > 'SLIO-ORG_lun0_bdev_96188952-7277-4e97-a2aa-f5dd6aad6fc2', 'GUID': ' > 360014059618895272774e97a2aaf5dd6', 'productID': 'lun0_bdev'}]} > from=::ffff:192.168.201.4,45636, flow_id=ed8310a1-a7af-4a67-b351-8ff364766b8a, > task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:52) > > > 2017-11-18 06:14:31,788-0500 INFO (jsonrpc/0) [vdsm.api] FINISH > getStorageDomainInfo return={'info': {'uuid': > 'cc61e074-a3b6-4371-9185-66079a39f123', > 'vgMetadataDevice': '360014059618895272774e97a2aaf5dd6', 'vguuid': > '7ifbmt-0elj-uWZZ-zS > LG-plA8-8hd3-JG298b', 'metadataDevice': '360014059618895272774e97a2aaf5dd6', > 'state': 'OK', 'version': '4', 'role': 'Regular', 'type': 'ISCSI', 'class': > 'Data', 'pool': [], 'name': 'iscsi'}} from=::ffff:192.168.201.4,45636, > flow_id=2c1876 > 99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api: > > vdsm host1: > > 2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c) [storage.Monitor] > Setting up monitor for c65437ce-339f-4b01-aeb5-45c1d486bf49 failed > (monitor:329) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line > 326, in _setupLoop > self._setupMonitor() > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line > 348, in _setupMonitor > self._produceDomain() > File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177, in > wrapper > value = meth(self, *a, **kw) > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line > 366, in _produceDomain > self.domain = sdCache.produce(self.sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 110, > in produce > domain.getRealDomain() > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51, > in getRealDomain > return self._cache._realProduce(self._sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134, > in _realProduce > domain = self._findDomain(sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151, > in _findDomain > return findMethod(sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 176, > in _findUnfetchedDomain > raise se.StorageDomainDoesNotExist(sdUUID) > StorageDomainDoesNotExist: Storage domain does not exist: > (u'c65437ce-339f-4b01-aeb5-45c1d486bf49',) > 2017-11-18 06:16:40,377-0500 INFO (jsonrpc/7) [api.host] START getStats() > from=::ffff:192.168.201.4,58722 (api:46) > 2017-11-18 06:16:40,378-0500 INFO (jsonrpc/7) [vdsm.api] START > repoStats(domains=()) from=::ffff:192.168.201.4,58722, > task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 (api:46) > 2017-11-18 06:16:40,379-0500 INFO (jsonrpc/7) [vdsm.api] FINISH repoStats > return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49': {'code': 358, 'actual': > True, 'version': -1, 'acquired': False, 'delay': '0', 'lastCheck': '6.1', > 'valid': False}, > u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual': True, > 'version': 4, 'acquired': True, 'delay': '0.00103987', 'lastCheck': '6.5', > 'valid': True}} from=::ffff:192.168.201.4,58722, > task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 > (api:52) > > engine log: > > 2017-11-18 06:15:54,040-05 ERROR > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] > (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Domain > 'c65437ce-339f-4b01-aeb5-45c1d486bf49:nfs' was reported with error code > '358' > 2017-11-18 06:15:54,041-05 ERROR > [org.ovirt.engine.core.bll.InitVdsOnUpCommand] > (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Storage Domain > 'nfs' of pool 'test-dc' is in problem in host 'lago-basic-suite-master-host- > 1' > 2017-11-18 06:15:54,045-05 ERROR [org.ovirt.engine.core.dal. > dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-29) > [4ce8aff3] EVENT_ID: VDS_STORAGE_VDS_STATS_FAILED(189), Host > lago-basic-suite-master-host-1 > reports about one of the Active Storage Domains as Problematic. > > > *</error>* > > > > _______________________________________________ > Devel mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/devel >
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
