On Sun, Mar 1, 2020 at 10:10 AM Yedidyah Bar David <[email protected]> wrote: > > Hi all, > > On Sun, Mar 1, 2020 at 6:06 AM <[email protected]> wrote: > > > > Project: > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/ > > Build: > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/366/ > > I think the root cause is: > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-4.3/366/artifact/exported-artifacts/test_logs/he-basic-suite-4.3/post-008_restart_he_vm.py/lago-he-basic-suite-4-3-host-0/_var_log/ovirt-hosted-engine-ha/broker.log > > StatusStorageThread::ERROR::2020-02-29 > 23:03:04,671::status_broker::90::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) > Failed to update state. > Traceback (most recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", > line 82, in run > if (self._status_broker._inquire_whiteboard_lock() or > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", > line 195, in _inquire_whiteboard_lock > self.host_id, self._lease_file) > SanlockException: (104, 'Sanlock lockspace inquire failure', > 'Connection reset by peer')
Can you point us to the source using the sanlock API? The messages looks like client error accessing sanlock server socket (maybe someone restarted sanlock at that point?) but it may also be some error code reused for sanlock internal error for accessing the storage. Usually you can find more info about the error in /var/sanlock.log > This caused the broker to restart itself, Restarting because sanlock failed does sound like useful error handling for broker clients. > and while it was doing that, > OST did 'hosted-engine --vm-status --json', which failed, thus failing > the build. If the broker may restart itself on errors, clients need to use a retry mechanism to deal with the restarts, so the test should probably have a retry mechanism before it fails. > This seems to me like another case of a communication problem in CI. > Not sure what else could have caused it to fail to inquire the status > of the lock. This (communication) issue was mentioned several times in > the past already. Are we doing anything re this? > > Thanks and best regards, > > > Build Number: 366 > > Build Status: Failure > > Triggered By: Started by timer > > > > ------------------------------------- > > Changes Since Last Success: > > ------------------------------------- > > Changes for Build #366 > > [Marcin Sobczyk] el8: Don't try to collect whole '/etc/httpd' dir > > > > > > > > > > ----------------- > > Failed Tests: > > ----------------- > > 1 tests failed. > > FAILED: 008_restart_he_vm.clear_global_maintenance > > > > Error Message: > > 1 != 0 > > -------------------- >> begin captured logging << -------------------- > > root: INFO: Waiting For System Stability... > > lago.ssh: DEBUG: start task:29a79ef5-e211-4672-ac5b-12bf0e5f8ee9:Get ssh > > client for lago-he-basic-suite-4-3-host-0: > > lago.ssh: DEBUG: end task:29a79ef5-e211-4672-ac5b-12bf0e5f8ee9:Get ssh > > client for lago-he-basic-suite-4-3-host-0: > > lago.ssh: DEBUG: Running 9a90ca60 on lago-he-basic-suite-4-3-host-0: > > hosted-engine --set-maintenance --mode=none > > lago.ssh: DEBUG: Command 9a90ca60 on lago-he-basic-suite-4-3-host-0 > > returned with 1 > > lago.ssh: DEBUG: Command 9a90ca60 on lago-he-basic-suite-4-3-host-0 errors: > > Cannot connect to the HA daemon, please check the logs. > > > > ovirtlago.testlib: ERROR: * Unhandled exception in <function <lambda> > > at 0x7f52673872a8> > > Traceback (most recent call last): > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 234, > > in assert_equals_within > > res = func() > > File > > "/home/jenkins/agent/workspace/ovirt-system-tests_he-basic-suite-4.3/ovirt-system-tests/he-basic-suite-4.3/test-scenarios/008_restart_he_vm.py", > > line 87, in <lambda> > > lambda: _set_and_test_maintenance_mode(host, False) > > File > > "/home/jenkins/agent/workspace/ovirt-system-tests_he-basic-suite-4.3/ovirt-system-tests/he-basic-suite-4.3/test-scenarios/008_restart_he_vm.py", > > line 108, in _set_and_test_maintenance_mode > > nt.assert_equals(ret.code, 0) > > File "/usr/lib64/python2.7/unittest/case.py", line 553, in assertEqual > > assertion_func(first, second, msg=msg) > > File "/usr/lib64/python2.7/unittest/case.py", line 546, in > > _baseAssertEqual > > raise self.failureException(msg) > > AssertionError: 1 != 0 > > --------------------- >> end captured logging << --------------------- > > > > Stack Trace: > > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run > > testMethod() > > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest > > self.test(*self.arg) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 142, > > in wrapped_test > > test() > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in > > wrapper > > return func(get_test_prefix(), *args, **kwargs) > > File > > "/home/jenkins/agent/workspace/ovirt-system-tests_he-basic-suite-4.3/ovirt-system-tests/he-basic-suite-4.3/test-scenarios/008_restart_he_vm.py", > > line 87, in clear_global_maintenance > > lambda: _set_and_test_maintenance_mode(host, False) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 282, > > in assert_true_within_short > > assert_equals_within_short(func, True, allowed_exceptions) > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 266, > > in assert_equals_within_short > > func, value, SHORT_TIMEOUT, allowed_exceptions=allowed_exceptions > > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 234, > > in assert_equals_within > > res = func() > > File > > "/home/jenkins/agent/workspace/ovirt-system-tests_he-basic-suite-4.3/ovirt-system-tests/he-basic-suite-4.3/test-scenarios/008_restart_he_vm.py", > > line 87, in <lambda> > > lambda: _set_and_test_maintenance_mode(host, False) > > File > > "/home/jenkins/agent/workspace/ovirt-system-tests_he-basic-suite-4.3/ovirt-system-tests/he-basic-suite-4.3/test-scenarios/008_restart_he_vm.py", > > line 108, in _set_and_test_maintenance_mode > > nt.assert_equals(ret.code, 0) > > File "/usr/lib64/python2.7/unittest/case.py", line 553, in assertEqual > > assertion_func(first, second, msg=msg) > > File "/usr/lib64/python2.7/unittest/case.py", line 546, in > > _baseAssertEqual > > raise self.failureException(msg) > > '1 != 0\n-------------------- >> begin captured logging << > > --------------------\nroot: INFO: Waiting For System > > Stability...\nlago.ssh: DEBUG: start > > task:29a79ef5-e211-4672-ac5b-12bf0e5f8ee9:Get ssh client for > > lago-he-basic-suite-4-3-host-0:\nlago.ssh: DEBUG: end > > task:29a79ef5-e211-4672-ac5b-12bf0e5f8ee9:Get ssh client for > > lago-he-basic-suite-4-3-host-0:\nlago.ssh: DEBUG: Running 9a90ca60 on > > lago-he-basic-suite-4-3-host-0: hosted-engine --set-maintenance > > --mode=none\nlago.ssh: DEBUG: Command 9a90ca60 on > > lago-he-basic-suite-4-3-host-0 returned with 1\nlago.ssh: DEBUG: Command > > 9a90ca60 on lago-he-basic-suite-4-3-host-0 errors:\n Cannot connect to the > > HA daemon, please check the logs.\n\novirtlago.testlib: ERROR: * > > Unhandled exception in <function <lambda> at 0x7f52673872a8>\nTraceback > > (most recent call last):\n File > > "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 234, in > > assert_equals_within\n res = func()\n File > > "/home/jenkins/agent/workspace/ovirt-system-tests_he-basic-suite-4.3/ovirt-system-tests/he-basic-suite-4.3/test-scenarios/008_restart_he_vm.py", > > line 87, in <lambda>\n lambda: _set_and_test_maintenance_mode(host, > > False)\n File > > "/home/jenkins/agent/workspace/ovirt-system-tests_he-basic-suite-4.3/ovirt-system-tests/he-basic-suite-4.3/test-scenarios/008_restart_he_vm.py", > > line 108, in _set_and_test_maintenance_mode\n > > nt.assert_equals(ret.code, 0)\n File > > "/usr/lib64/python2.7/unittest/case.py", line 553, in assertEqual\n > > assertion_func(first, second, msg=msg)\n File > > "/usr/lib64/python2.7/unittest/case.py", line 546, in _baseAssertEqual\n > > raise self.failureException(msg)\nAssertionError: 1 != > > 0\n--------------------- >> end captured logging << ---------------------' > > > > -- > Didi > _______________________________________________ > Infra mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/QGRYTQWRPEF5Y2UUQI7UML5JE66GNXVA/ _______________________________________________ Infra mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/C5QA2PSJL3VHUVTGZIVDQG7SVWS43ZFH/
