On 11/20/2017 01:16 PM, Yaniv Kaul wrote: > > > On Mon, Nov 20, 2017 at 3:10 PM, Dafna Ron <[email protected] > <mailto:[email protected]>> wrote: > > Hi, > > We had a failure in OST for test 002_bootstrap.verify_add_all_hosts. > > From the logs I can see that vdsm on host0 was reporting that it > cannot find the physical volume but eventually the storage was > created and is reported as responsive. > > However, Host1 is reported to became non-operational with storage > domain does not exist error and I think that there is a race. > > > I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1514906 on this. > > > I think that we create the storage domain while host1 is being > installed and if the domain is not created and reported as > activated in time, host1 will become nonOperational. > > > And based on the above description, this is exactly the issue I've > described in the BZ. > Y. >
+1 this seems exactly the same issue. > are we starting installation of host1 before host0 and storage are > active? > > ** > > *Link to suspected patches: I do not think that the patch reported > is related to the error* > > * > > ** > > ** > > https://gerrit.ovirt.org/#/c/84133/ > <https://gerrit.ovirt.org/#/c/84133/>** > > > Link to Job: > > ** > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/ > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/> > > > Link to all logs: > > > * > > > *http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/ > > <http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3902/artifact/> > * > > * > * > > *(Relevant) error snippet from the log: * > > * > > <error> > > > Lago log: > * > > 2017-11-18 > 11:15:25,472::log_utils.py::end_log_task::670::nose::INFO:: # > add_master_storage_domain: ESC[32mSuccessESC[0m (in 0:01:09) > 2017-11-18 > 11:15:25,472::log_utils.py::start_log_task::655::nose::INFO:: # > add_secondary_storage_domains: ESC[0mESC[0m > 2017-11-18 > 11:16:47,455::log_utils.py::end_log_task::670::nose::INFO:: # > add_secondary_storage_domains: ESC[32mSuccessESC[0m (in 0:01:21) > 2017-11-18 > 11:16:47,456::log_utils.py::start_log_task::655::nose::INFO:: # > import_templates: ESC[0mESC[0m > 2017-11-18 > 11:16:47,513::testlib.py::stopTest::198::nose::INFO:: * > SKIPPED: Exported domain generation not supported yet > 2017-11-18 > 11:16:47,514::log_utils.py::end_log_task::670::nose::INFO:: # > import_templates: ESC[32mSuccessESC[0m (in 0:00:00) > 2017-11-18 > 11:16:47,514::log_utils.py::start_log_task::655::nose::INFO:: # > verify_add_all_hosts: ESC[0mESC[0m > 2017-11-18 > > 11:16:47,719::testlib.py::assert_equals_within::227::ovirtlago.testlib::ERROR:: > > * Unhandled exception in <function <lambda> at 0x2909230> > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", > line 219, in assert_equals_within > res = func() > File > > "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py", > line 430, in <lambda> > lambda: _all_hosts_up(hosts_service, total_hosts) > File > > "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py", > line 129, in _all_hosts_up > _check_problematic_hosts(hosts_service) > File > > "/home/jenkins/workspace/ovirt-master_change-queue-tester/ovirt-system-tests/basic-suite-master/test-scenarios/002_bootstrap.py", > line 149, in _check_problematic_hosts > raise RuntimeError(dump_hosts) > RuntimeError: 1 hosts failed installation: > lago-basic-suite-master-host-1: non_operational > > 2017-11-18 > 11:16:47,722::utils.py::wrapper::480::lago.utils::DEBUG::Looking > for a workdir > 2017-11-18 > > 11:16:47,722::workdir.py::resolve_workdir_path::361::lago.workdir::DEBUG::Checking > if /dev/shm/ost/deployment-basic-suite-master is a workdir > 2017-11-18 > 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO:: > * Collect artifacts: ESC[0mESC[0m > 2017-11-18 > 11:16:47,724::log_utils.py::__enter__::600::lago.prefix::INFO:: > * Collect artifacts: ESC[0mESC[0m > > vdsm host0: > > 2017-11-18 06:14:23,980-0500 INFO (jsonrpc/0) [vdsm.api] START > getDeviceList(storageType=3, > guids=[u'360014059618895272774e97a2aaf5dd6'], checkStatus=False, > options={}) from=::ffff:192.168.201.4,45636, > flow_id=ed8310a1-a7af-4a67-b351-8ff > 364766b8a, task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:46) > 2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.LVM] lvm > pvs failed: 5 [] [' Failed to find physical volume > "/dev/mapper/360014059618895272774e97a2aaf5dd6".'] (lvm:322) > 2017-11-18 06:14:24,353-0500 WARN (jsonrpc/0) [storage.HSM] getPV > failed for guid: 360014059618895272774e97a2aaf5dd6 (hsm:1973) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", > line 1970, in _getDeviceList > pv = lvm.getPV(guid) > File "/usr/lib/python2.7/site-packages/vdsm/storage/lvm.py", > line 852, in getPV > raise se.InaccessiblePhysDev((pvName,)) > InaccessiblePhysDev: Multipath cannot access physical device(s): > "devices=(u'360014059618895272774e97a2aaf5dd6',)" > 2017-11-18 06:14:24,389-0500 INFO (jsonrpc/0) [vdsm.api] FINISH > getDeviceList return={'devList': [{'status': 'unknown', > 'vendorID': 'LIO-ORG', 'capacity': '21474836480', 'fwrev': '4.0', > 'discard_zeroes_data': 0, 'vgUUID': '', 'pvsize': '', 'pathlist': > [{'initiatorname': u'default', 'connection': u'192.168.200.4', > 'iqn': u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user': > u'username', 'password': '********', 'port': '3260'}, > {'initiatorname': u'default', 'connection': u'192.168.201.4', > 'iqn': u'iqn.2014-07.org.ovirt:storage', 'portal': '1', 'user': > u'username', 'password': '********', 'port': '3260'}], > 'logicalblocksize': '512', 'discard_max_bytes': 1073741824, > 'pathstatus': [{'type': 'iSCSI', 'physdev': 'sda', 'capacity': > '21474836480', 'state': 'active', 'lun': '0'}, {'type': 'iSCSI', > 'physdev': 'sdf', 'capacity': '21474836480', 'state': 'active', > 'lun': '0'}], 'devtype': 'iSCSI', 'physicalblocksize': '512', > 'pvUUID': '', 'serial': > 'SLIO-ORG_lun0_bdev_96188952-7277-4e97-a2aa-f5dd6aad6fc2', 'GUID': > '360014059618895272774e97a2aaf5dd6', 'productID': 'lun0_bdev'}]} > from=::ffff:192.168.201.4,45636, > flow_id=ed8310a1-a7af-4a67-b351-8ff364766b8a, > task_id=6ced0092-34cd-49f0-aa0f-6aae498af37f (api:52) > > > 2017-11-18 06:14:31,788-0500 INFO (jsonrpc/0) [vdsm.api] FINISH > getStorageDomainInfo return={'info': {'uuid': > 'cc61e074-a3b6-4371-9185-66079a39f123', 'vgMetadataDevice': > '360014059618895272774e97a2aaf5dd6', 'vguuid': '7ifbmt-0elj-uWZZ-zS > LG-plA8-8hd3-JG298b', 'metadataDevice': > '360014059618895272774e97a2aaf5dd6', 'state': 'OK', 'version': > '4', 'role': 'Regular', 'type': 'ISCSI', 'class': 'Data', 'pool': > [], 'name': 'iscsi'}} from=::ffff:192.168.201.4,45636, flow_id=2c1876 > 99, task_id=c2080b61-d4a5-4bdb-9d75-f81580a8257a (api: > > vdsm host1: > > 2017-11-18 06:16:34,315-0500 ERROR (monitor/c65437c) > [storage.Monitor] Setting up monitor for > c65437ce-339f-4b01-aeb5-45c1d486bf49 failed (monitor:329) > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", > line 326, in _setupLoop > self._setupMonitor() > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", > line 348, in _setupMonitor > self._produceDomain() > File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 177, > in wrapper > value = meth(self, *a, **kw) > File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", > line 366, in _produceDomain > self.domain = sdCache.produce(self.sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", > line 110, in produce > domain.getRealDomain() > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", > line 51, in getRealDomain > return self._cache._realProduce(self._sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", > line 134, in _realProduce > domain = self._findDomain(sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", > line 151, in _findDomain > return findMethod(sdUUID) > File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", > line 176, in _findUnfetchedDomain > raise se.StorageDomainDoesNotExist(sdUUID) > StorageDomainDoesNotExist: Storage domain does not exist: > (u'c65437ce-339f-4b01-aeb5-45c1d486bf49',) > 2017-11-18 06:16:40,377-0500 INFO (jsonrpc/7) [api.host] START > getStats() from=::ffff:192.168.201.4,58722 (api:46) > 2017-11-18 06:16:40,378-0500 INFO (jsonrpc/7) [vdsm.api] START > repoStats(domains=()) from=::ffff:192.168.201.4,58722, > task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 (api:46) > 2017-11-18 06:16:40,379-0500 INFO (jsonrpc/7) [vdsm.api] FINISH > repoStats return={u'c65437ce-339f-4b01-aeb5-45c1d486bf49': > {'code': 358, 'actual': True, 'version': -1, 'acquired': False, > 'delay': '0', 'lastCheck': '6.1', 'valid': False}, > u'cc61e074-a3b6-4371-9185-66079a39f123': {'code': 0, 'actual': > True, 'version': 4, 'acquired': True, 'delay': '0.00103987', > 'lastCheck': '6.5', 'valid': True}} > from=::ffff:192.168.201.4,58722, > task_id=8fb74944-08c0-491e-ad55-a7a9f0a11ef8 > (api:52) > > engine log: > > 2017-11-18 06:15:54,040-05 ERROR > [org.ovirt.engine.core.vdsbroker.irsbroker.IrsProxy] > (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Domain > 'c65437ce-339f-4b01-aeb5-45c1d486bf49:nfs' was reported with error > code '358' > 2017-11-18 06:15:54,041-05 ERROR > [org.ovirt.engine.core.bll.InitVdsOnUpCommand] > (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] Storage > Domain 'nfs' of pool 'test-dc' is in problem in host > 'lago-basic-suite-master-host-1' > 2017-11-18 06:15:54,045-05 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedThreadFactory-engine-Thread-29) [4ce8aff3] EVENT_ID: > VDS_STORAGE_VDS_STATS_FAILED(189), Host lago-basic-suite-master-host-1 > reports about one of the Active Storage Domains as Problematic.** > ** > > * > * > > *</error>* > > ** > > > > _______________________________________________ > Devel mailing list > [email protected] <mailto:[email protected]> > http://lists.ovirt.org/mailman/listinfo/devel > <http://lists.ovirt.org/mailman/listinfo/devel> > >
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
