Hello all,
lately i witnessed multiple failures for add_master_storage_domain test,
which were not related to changes themselves, nor any infra issue. One
example can be found here [1].
After investigation with huge help of Milan, issue is that Host falls from
up state to whatever-but-not-up suddenly.


   1. add_storage_domain picks a random host that is in up state
   2. meantime engine starts fence action for it, so probably something
   gone bad with the host; the fence action fails with:
*[org.ovirt.engine.core.bll.pm.FenceProxyLocator]
   (EE-ManagedThreadFactory-engineScheduled-Thread-38) [6692895f] Can not run
   fence action on host 'lago-basic-suite-master-host-0', no suitable proxy
   host was found.*
   3. test fails on not being able to attach the domain to non-up
host: *[org.ovirt.engine.api.restapi.resource.AbstractBackendResource]
   (default task-1) [] Operation Failed: [Cannot add storage server connection
   when Host status is not up]*

For better orientation in failed job's engine log [1], fence action for
host fails at
:46:12,842-04
engine learns it cannot connect storage to host at
:46:16,105-04

The test itself add_master_storage_domain starts at ~ :46:13,753 (according
to lago log).

Could you please check this?
Thanks

[1] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/15829
[2]
https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/15829/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/engine.log
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/MX7YJC4GLCOQCWXCQJB7BWEVPE6QCKXD/

Reply via email to