On Sun, Mar 18, 2018 at 4:21 PM Yedidyah Bar David <[email protected]> wrote:
> On Sun, Mar 18, 2018 at 2:48 PM, Yedidyah Bar David <[email protected]> > wrote: > > On Sun, Mar 18, 2018 at 1:45 PM, Yedidyah Bar David <[email protected]> > wrote: > >> On Sun, Mar 18, 2018 at 11:20 AM, <[email protected]> > wrote: > >>> Project: > http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/ > >>> Build: > http://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.1/223/ > >>> Build Number: 223 > >>> Build Status: Still Failing > >>> Triggered By: Started by timer > >> > >> It was broken by: > >> > >> [1] https://gerrit.ovirt.org/88483 > >> > >> It should be fixed by: > >> > >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1554283 > >> > >> [2] is on modified, no idea about its status other than that. > >> > >> I didn't intend to merge [1] before [2] is fixed, not sure if > >> Sandro didn't notice my comment there, or thought that [2] is fixed. > >> > >> If it's annoying we can revert [1] and re-merge when [2] is fixed. > > > > Talked with Gal, and pushed this, which should hopefully fix: > > > > https://gerrit.ovirt.org/89136 > > It indeed seems to fix [1][2]: > > 13:13:24 # he_get_shared_config: > 13:13:26 # he_get_shared_config: Success (in 0:00:01) > 13:13:26 # sleep: > 13:15:26 # sleep: Success (in 0:02:00) > 13:15:26 # add_he_hosts: > 13:16:18 # add_he_hosts: Success (in 0:00:52) > 13:16:18 # he_check_ha_agent: > 13:16:19 # he_check_ha_agent: Success (in 0:00:00) > > But later fails: > > 13:16:20 # add_secondary_storage_domains: > 13:19:30 Error while running thread > 13:19:30 Traceback (most recent call last): > 13:19:30 File "/usr/lib/python2.7/site-packages/lago/utils.py", line > 58, in _ret_via_queue > 13:19:30 queue.put({'return': func()}) > 13:19:30 File > > "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py", > line 491, in add_nfs_storage_domain > 13:19:30 add_generic_nfs_storage_domain(prefix, SD_NFS_NAME, > SD_NFS_HOST_NAME, SD_NFS_PATH) > 13:19:30 File > > "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py", > line 496, in add_generic_nfs_storage_domain > 13:19:30 add_generic_nfs_storage_domain_4(prefix, sd_nfs_name, > nfs_host_name, mount_path, sd_format, sd_type, nfs_version) > 13:19:30 File > > "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py", > line 552, in add_generic_nfs_storage_domain_4 > 13:19:30 _add_storage_domain_4(api, p) > 13:19:30 File > > "/home/jenkins/workspace/ovirt-system-tests_master_check-patch-el7-x86_64/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py", > line 466, in _add_storage_domain_4 > 13:19:30 id=sd.id, > 13:19:30 File > "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 2219, > in add > 13:19:30 return self._internal_add(storage_domain, headers, query, > wait) > 13:19:30 File > "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 223, > in _internal_add > 13:19:30 return future.wait() if wait else future > 13:19:30 File > "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 53, in > wait > 13:19:30 return self._code(response) > 13:19:30 File > "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 220, > in callback > 13:19:30 self._check_fault(response) > 13:19:30 File > "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 123, > in _check_fault > 13:19:30 self._raise_error(response, body) > 13:19:30 File > "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 109, > in _raise_error > 13:19:30 raise error > 13:19:30 Error: Fault reason is "Operation Failed". Fault detail is > "[Storage domain cannot be reached. Please ensure it is accessible > from the host(s).]". HTTP response code is 400. > 13:19:30 Error while running thread > > I'll retrigger now, but perhaps someone from storage wants to check. > vdsm log has[3]: > > 2018-03-18 08:59:52,918-0400 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] > RPC call Host.getHardwareInfo succeeded in 0.00 seconds (__init__:539) > 2018-03-18 08:59:52,924-0400 INFO (jsonrpc/4) [vdsm.api] START > prepareImage(sdUUID=u'424e809d-b7ad-4ed5-b6a1-5426d373a5d2', > spUUID=u'8509c64f-cdd7-4713-9e09-a79d90ba26ed', > imgUUID=u'718090ba-b36e-45cc-bcd6-597c16a766b9', > leafUUID=u'23ee843b-20e0-4afe-98dc-e165334ac710', allowIllegal=False) > from=::1,36020, task_id=92aeabf2-084a-4c98-a6e5-03efae38e7b3 (api:46) > 2018-03-18 08:59:52,928-0400 INFO (jsonrpc/4) [vdsm.api] FINISH > prepareImage error=Volume does not exist: > (u'23ee843b-20e0-4afe-98dc-e165334ac710',) from=::1,36020, > task_id=92aeabf2-084a-4c98-a6e5-03efae38e7b3 (api:50) > 2018-03-18 08:59:52,928-0400 ERROR (jsonrpc/4) > [storage.TaskManager.Task] > (Task='92aeabf2-084a-4c98-a6e5-03efae38e7b3') Unexpected error > (task:872) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 879, in _run > return fn(*args, **kargs) > File "<string>", line 2, in prepareImage > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 48, in > method > ret = func(*args, **kwargs) > File "/usr/share/vdsm/storage/hsm.py", line 3137, in prepareImage > raise se.VolumeDoesNotExist(leafUUID) > VolumeDoesNotExist: Volume does not exist: > (u'23ee843b-20e0-4afe-98dc-e165334ac710',) > Immediately after this failed prepare, we see: 2018-03-18 08:59:52,941-0400 INFO (jsonrpc/7) [vdsm.api] START createVolume(sdUUID=u'424e809d-b7ad-4ed5-b6a1-5426d373a5d2', spUUID=u'8509c64f-cdd7-4713-9e09-a79d90ba26ed', imgUUID=u'718090ba-b36e-45cc-bcd6-597c16a766b9', size=u'1048576', volFormat=5, preallocate=1, diskType=2, volUUID=u'23ee843b-20e0-4afe-98dc-e165334ac710', desc=u'hosted-engine.lockspace', srcImgUUID=u'00000000-0000-0000-0000-000000000000', srcVolUUID=u'00000000-0000-0000-0000-000000000000', initialSize=None) from=::1,36020, task_id=c4c73314-ee10-419a-8265-b8b87763e99f (api:46) So that volume really did not exit when it was prepared, which means it is a client error, not vdsm error. The caller (hosted engine agent?) should use Volume.getInfo to check if a volume exits. Non existing volume is considered expected error in this call and will be logged as with INFO log level. See https://github.com/oVirt/vdsm/blob/f286cf7825578d8997ca6abc0801a823dee247ef/lib/vdsm/storage/hsm.py#L3075 > [1] > http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/ > [2] > http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/consoleFull > [3] > http://jenkins.ovirt.org/job/ovirt-system-tests_master_check-patch-el7-x86_64/4537/artifact/exported-artifacts/he-basic-suite-4.1__logs/test_logs/he-basic-suite-4.1/post-002_bootstrap.py/lago-he-basic-suite-4-1-host0/_var_log/vdsm/vdsm.log > > > > >> > >>> > >>> ------------------------------------- > >>> Changes Since Last Success: > >>> ------------------------------------- > >>> Changes for Build #218 > >>> [Martin Sivak] Cleanup of test code - next() replaced with any() > >>> > >>> > >>> Changes for Build #219 > >>> [Martin Sivak] Cleanup of test code - next() replaced with any() > >>> > >>> > >>> Changes for Build #220 > >>> [Martin Sivak] Cleanup of test code - next() replaced with any() > >>> > >>> > >>> Changes for Build #221 > >>> [Martin Sivak] Cleanup of test code - next() replaced with any() > >>> > >>> > >>> Changes for Build #222 > >>> [Martin Sivak] Cleanup of test code - next() replaced with any() > >>> > >>> > >>> Changes for Build #223 > >>> [Martin Sivak] Cleanup of test code - next() replaced with any() > >>> > >>> > >>> > >>> > >>> ----------------- > >>> Failed Tests: > >>> ----------------- > >>> 1 tests failed. > >>> FAILED: 002_bootstrap.he_check_ha_agent > >>> > >>> Error Message: > >>> False is not true > >>> -------------------- >> begin captured logging << -------------------- > >>> lago.ssh: DEBUG: start task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get > ssh client for lago-he-basic-suite-4-1-host0: > >>> lago.ssh: DEBUG: end task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh > client for lago-he-basic-suite-4-1-host0: > >>> lago.ssh: DEBUG: Running 7e67a956 on lago-he-basic-suite-4-1-host0: > grep Permission denied /var/log/ovirt-hosted-engine-ha/agent.log > >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0 > returned with 2 > >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0 > output: > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf' > >>> > >>> lago.ssh: DEBUG: Command 7e67a956 on lago-he-basic-suite-4-1-host0 > errors: > >>> grep: denied: No such file or directory > >>> > >>> --------------------- >> end captured logging << --------------------- > >>> > >>> Stack Trace: > >>> File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > >>> testMethod() > >>> File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in > runTest > >>> self.test(*self.arg) > >>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 129, in wrapped_test > >>> test() > >>> File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line > 59, in wrapper > >>> return func(get_test_prefix(), *args, **kwargs) > >>> File > "/home/jenkins/workspace/ovirt-system-tests_he-basic-suite-4.1/ovirt-system-tests/he-basic-suite-4.1/test-scenarios/002_bootstrap.py", > line 1036, in he_check_ha_agent > >>> 'Permission denied' not in result.out > >>> File "/usr/lib64/python2.7/unittest/case.py", line 462, in assertTrue > >>> raise self.failureException(msg) > >>> "False is not true\n-------------------- >> begin captured logging << > --------------------\nlago.ssh: DEBUG: start > task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh client for > lago-he-basic-suite-4-1-host0:\nlago.ssh: DEBUG: end > task:cb75d7f3-e333-47f1-b3e9-76ff9c0b4b21:Get ssh client for > lago-he-basic-suite-4-1-host0:\nlago.ssh: DEBUG: Running 7e67a956 on > lago-he-basic-suite-4-1-host0: grep Permission denied > /var/log/ovirt-hosted-engine-ha/agent.log\nlago.ssh: DEBUG: Command > 7e67a956 on lago-he-basic-suite-4-1-host0 returned with 2\nlago.ssh: DEBUG: > Command 7e67a956 on lago-he-basic-suite-4-1-host0 output:\n > /var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] Permission > denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: > [Errno 13] Permission denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: > [Errno 13] Permission denied: '/var/run/ovirt-hosted-engine-ha/vm.conf > '\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: [Errno 13] > Permission denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: > [Errno 13] Permission denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: > [Errno 13] Permission denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: > [Errno 13] Permission denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: > [Errno 13] Permission denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n/var/log/ovirt-hosted-engine-ha/agent.log:IOError: > [Errno 13] Permission denied: > '/var/run/ovirt-hosted-engine-ha/vm.conf'\n\nlago.ssh: DEBUG: Command > 7e67a956 on lago-he-basic-suite-4-1-host0 errors:\n grep: denied: No such > file or directory\n\n--------------------- >> end captured logging << > ---------------------" > >> > >> > >> > >> -- > >> Didi > > > > > > > > -- > > Didi > > > > -- > Didi > _______________________________________________ > Infra mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/infra >
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
