On Mon, Mar 8, 2021 at 2:26 PM Yedidyah Bar David <[email protected]> wrote: > > On Sun, Mar 7, 2021 at 8:24 PM Yedidyah Bar David <[email protected]> wrote: > > > > On Sun, Mar 7, 2021 at 4:06 PM Yedidyah Bar David <[email protected]> wrote: > > > > > > On Sun, Mar 7, 2021 at 5:12 AM <[email protected]> wrote: > > > > > > > > Project: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/ > > > > Build: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1943/ > > > > > > Failed during > > > -basic-suite-master.test-scenarios.test_004_basic_sanity.test_ha_recovery > > > due to timeout. Can't tell exactly why. Now rebuilt, to see if it > > > happens again. Basic suite did pass this point with (mostly) the same > > > code, and there shouldn't be anything HE-specific in this test. FYI. > > > > It did fail again: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1944/ > > > > Again, engine.log [1] has no 'VM_DOWN_ERROR'. Any clue? > > OK. It seems like an OST issue. With [1], I get: > > 2021-03-08 13:05:10+0100,088 DEBUG (MainThread) > [test_004_basic_sanity] test_ha_recovery: > ansible_host=ModuleMapper<config_builder=ConfigBuilder<inventory=/tmp/tmp_fdlcj8y, > host_pattern=la > go-he-basic-suite-master-host-0.lago.local>> (test_004_basic_sanity:1072) > 2021-03-08 13:05:10+0100,650 DEBUG (MainThread) > [test_004_basic_sanity] test_ha_recovery: pgrep_res={'cmd': 'dig > +short vm0', 'stdout': '192.168.200.147', 'stderr': '', 'rc': 0, > 'start': ' > 2021-03-08 13:01:59.636833', 'end': '2021-03-08 13:01:59.656727', > 'delta': '0:00:00.019894', 'changed': True, 'invocation': > {'module_args': {'_raw_params': 'dig +short vm0', '_uses_shell': T > rue, 'warn': True, 'stdin_add_newline': True, 'strip_empty_ends': > True, 'argv': None, 'chdir': None, 'executable': None, 'creates': > None, 'removes': None, 'stdin': None}}, 'stdout_lines': [' > 192.168.200.147'], 'stderr_lines': [], '_ansible_no_log': False} > (test_004_basic_sanity:1074) > 2021-03-08 13:05:10+0100,650 DEBUG (MainThread) > [test_004_basic_sanity] test_ha_recovery: pid=192.168.200.147 > (test_004_basic_sanity:1076) > 2021-03-08 13:05:11+0100,194 DEBUG (MainThread) > [test_004_basic_sanity] test_ha_recovery: kill_res={'cmd': 'dig +short > vm0', 'stdout': '192.168.200.147', 'stderr': '', 'rc': 0, 'start': '2 > 021-03-08 13:01:59.636833', 'end': '2021-03-08 13:01:59.656727', > 'delta': '0:00:00.019894', 'changed': True, 'invocation': > {'module_args': {'_raw_params': 'dig +short vm0', '_uses_shell': Tr > ue, 'warn': True, 'stdin_add_newline': True, 'strip_empty_ends': True, > 'argv': None, 'chdir': None, 'executable': None, 'creates': None, > 'removes': None, 'stdin': None}}, 'stdout_lines': ['1 > 92.168.200.147'], 'stderr_lines': [], '_ansible_no_log': False} > (test_004_basic_sanity:1078)
I think that this fails due to [1] and due to the fact that in HE suites, we deploy host-0 as part of HE deploy (not OST), and with the FQDN - which causes the engine to return this host FQDN when asking 'which host runs some VM'. Pushed [2], let's see. [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113391 [2] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113814 > > [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113804 > > > > > [1] > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1944/artifact/exported-artifacts/test_logs/he-basic-suite-master/lago-he-basic-suite-master-engine/_var_log/ovirt-engine/engine.log > > > > > > > > Some more details: > > > > > > The function is: > > > > > > @order_by(_TEST_LIST) > > > def test_ha_recovery(engine_api, get_ansible_host_for_vm): > > > engine = engine_api.system_service() > > > with engine_utils.wait_for_event(engine, [119, 9602, 506]): > > > # VM_DOWN_ERROR event(119) > > > # HA_VM_FAILED event event(9602) > > > # VDS_INITIATED_RUN_VM event(506) > > > ansible_host = get_ansible_host_for_vm(VM2_NAME) > > > pid = ansible_host.shell('pgrep -f > > > qemu.*guest=vm2')['stdout'].strip() > > > ansible_host.shell('kill -KILL {}'.format(pid)) > > > > > > vm_service = test_utils.get_vm_service(engine, VM2_NAME) > > > assertions.assert_true_within_long( > > > lambda: > > > vm_service.get().status == types.VmStatus.UP > > > ) > > > with engine_utils.wait_for_event(engine, 33): # USER_STOP_VM event > > > vm_service.stop() > > > > > > It failed inside the 'with' statement. Can't find any of the expected > > > events in engine.log. In last successful I see all three of them. > > > > > > We do not actively log anything about running the pgrep command, but > > > it's logged by ansible on the host, to /var/log/messages. I see it in > > > last successful, but not in above failed run, in any of the two hosts. > > > So most likely it didn't even get to run pgrep. > > > > > > IMO we probably want some more information logged in such a case, to > > > allow further investigation. > > > > > > Best regards, > > > > > > > Build Number: 1943 > > > > Build Status: Still Failing > > > > Triggered By: Started by timer > > > > > > > > ------------------------------------- > > > > Changes Since Last Success: > > > > ------------------------------------- > > > > Changes for Build #1939 > > > > [hbraha] ost: Check if user custom repos are used on VM > > > > > > > > > > > > Changes for Build #1940 > > > > [hbraha] network: add check_repos fixture > > > > > > > > > > > > Changes for Build #1941 > > > > [Andrej Cernek] network, pylint: Simplify pylintrc file > > > > > > > > > > > > Changes for Build #1942 > > > > [Andrej Cernek] network, pylint: Simplify pylintrc file > > > > > > > > > > > > Changes for Build #1943 > > > > [Andrej Cernek] network, pylint: Simplify pylintrc file > > > > > > > > > > > > > > > > > > > > ----------------- > > > > Failed Tests: > > > > ----------------- > > > > 1 tests failed. > > > > FAILED: > > > > he-basic-suite-master.test-scenarios.test_004_basic_sanity.test_ha_recovery > > > > > > > > Error Message: > > > > AssertionError: False != True after 600 seconds > > > > > > > > Stack Trace: > > > > engine_api = <ovirtsdk4.Connection object at 0x7f408114c1d0> > > > > get_ansible_host_for_vm = <function > > > > get_ansible_host_for_vm.<locals>.ansible_host_for at 0x7f4081171950> > > > > > > > > @order_by(_TEST_LIST) > > > > def test_ha_recovery(engine_api, get_ansible_host_for_vm): > > > > engine = engine_api.system_service() > > > > with engine_utils.wait_for_event(engine, [119, 9602, 506]): > > > > # VM_DOWN_ERROR event(119) > > > > # HA_VM_FAILED event event(9602) > > > > # VDS_INITIATED_RUN_VM event(506) > > > > ansible_host = get_ansible_host_for_vm(VM2_NAME) > > > > pid = ansible_host.shell('pgrep -f > > > > qemu.*guest=vm2')['stdout'].strip() > > > > > ansible_host.shell('kill -KILL {}'.format(pid)) > > > > > > > > ../he-basic-suite-master/test-scenarios/test_004_basic_sanity.py:1070: > > > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > > _ _ _ _ > > > > /usr/lib64/python3.6/contextlib.py:88: in __exit__ > > > > next(self.gen) > > > > ../ost_utils/ost_utils/engine_utils.py:42: in wait_for_event > > > > lambda: > > > > ../ost_utils/ost_utils/assertions.py:96: in assert_true_within_long > > > > assert_equals_within_long(func, True, allowed_exceptions) > > > > ../ost_utils/ost_utils/assertions.py:81: in assert_equals_within_long > > > > func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions > > > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > > > _ _ _ _ > > > > > > > > func = <function wait_for_event.<locals>.<lambda> at 0x7f40810d5bf8> > > > > value = True, timeout = 600, allowed_exceptions = [], initial_wait = 0 > > > > error_message = 'False != True after 600 seconds' > > > > > > > > def assert_equals_within( > > > > func, value, timeout, allowed_exceptions=None, initial_wait=10, > > > > error_message=None > > > > ): > > > > allowed_exceptions = allowed_exceptions or [] > > > > res = '<no-result-obtained>' > > > > with _EggTimer(timeout) as timer: > > > > while not timer.elapsed(): > > > > try: > > > > res = func() > > > > if res == value: > > > > return > > > > except Exception as exc: > > > > if _instance_of_any(exc, allowed_exceptions): > > > > time.sleep(3) > > > > continue > > > > > > > > LOGGER.exception("Unhandled exception in %s", func) > > > > raise > > > > > > > > if initial_wait == 0: > > > > time.sleep(3) > > > > else: > > > > time.sleep(initial_wait) > > > > initial_wait = 0 > > > > try: > > > > if error_message is None: > > > > error_message = '%s != %s after %s seconds' % (res, > > > > value, timeout) > > > > > raise AssertionError(error_message) > > > > E AssertionError: False != True after 600 seconds > > > > > > > > ../ost_utils/ost_utils/assertions.py:59: AssertionError > > > > > > > > > > > > -- > > > Didi > > > > > > > > -- > > Didi > > > > -- > Didi -- Didi _______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/WISTZGFP2JNV3DSDLLWPEWVKM7QGTWUF/
