This still use the older daemon, the patch improving logging was merged today at 13:02 Please check again with current version.
On Tue, Nov 7, 2017 at 11:54 AM Dafna Ron <[email protected]> wrote: > we had the same failure this morning: > > Failed build: > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/ > > All Logs: > > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/ > > engine log: > > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171107030411-lago-basic-suite-master-host-0-5f90b210.log > > host logs: > > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3646/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ > > > > On 11/06/2017 08:26 PM, Nir Soffer wrote: > > On Mon, Nov 6, 2017 at 4:16 PM Yedidyah Bar David <[email protected]> wrote: > >> On Mon, Nov 6, 2017 at 1:57 PM, Dafna Ron <[email protected]> wrote: >> > adding Didi. >> > >> > >> > On 11/06/2017 11:51 AM, Ala Hino wrote: >> > >> > Suspected patch (https://gerrit.ovirt.org/#/c/83612/) is about cold >> merge >> > and has nothing to do with host deploy. >> > >> > On Mon, Nov 6, 2017 at 1:39 PM, Dafna Ron <[email protected]> wrote: >> >> >> >> Hi, >> >> >> >> We failed test 002_bootstrap.verify_add_hosts >> >> >> >> I can see we only tried to install one of the hosts (host-0) and >> failed. >> >> the second host has no log which means we did not try to deploy it. >> >> >> >> The error suggests that we ovirt-imageio-daemon failed to start. >> However, >> >> there is another message that I think should be addressed about >> conflicting >> >> vdsm and libvirt configurations. >> >> >> >> Link to suspected patches: https://gerrit.ovirt.org/#/c/83612/ >> >> >> >> >> >> Link to Job: >> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/ >> >> >> >> >> >> Link to all logs: >> >> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/ >> >> >> >> >> >> >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-engine/_var_log/ovirt-engine/host-deploy/ovirt-host-deploy-20171106025647-lago-basic-suite-master-host-0-5530ab1f.log >> >> >> >> >> >> (Relevant) error snippet from the log: >> >> >> >> <error> >> >> >> >> \ >> >> >> >> 2017-11-06 02:56:46,526-0500 DEBUG >> >> otopi.plugins.ovirt_host_deploy.vdsm.packages plugin.execute:921 >> >> execute-output: ('/usr/bin/vdsm-tool', 'configure', '--force') stdout: >> >> >> >> Checking configuration status... >> >> >> >> abrt is not configured for vdsm >> >> WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based >> on >> >> vdsm configuration >> >> lvm requires configuration >> >> libvirt is not configured for vdsm yet >> >> FAILED: conflicting vdsm and libvirt-qemu tls configuration. >> >> vdsm.conf with ssl=True requires the following changes: >> >> libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1 >> >> qemu.conf: spice_tls=1. >> >> multipath requires configuration >> >> >> >> >> >> 2017-11-06 02:56:47,551-0500 DEBUG otopi.plugins.otopi.services.systemd >> >> plugin.execute:926 execute-output: ('/usr/bin/systemctl', 'start', >> >> 'ovirt-imageio-daemon.service') stderr: >> >> Job for ovirt-imageio-daemon.service failed because the control process >> >> exited with error code. See "systemctl status >> ovirt-imageio-daemon.service" >> >> and "journalctl -xe" for details. >> >> >> >> 2017-11-06 02:56:47,552-0500 DEBUG otopi.context >> >> context._executeMethod:143 method exception >> >> Traceback (most recent call last): >> >> File "/tmp/ovirt-R4R8gZhaQI/pythonlib/otopi/context.py", line 133, in >> >> _executeMethod >> >> method['method']() >> >> File >> >> >> "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", >> >> line 179, in _start >> >> self.services.state('ovirt-imageio-daemon', True) >> >> File "/tmp/ovirt-R4R8gZhaQI/otopi-plugins/otopi/services/systemd.py", >> >> line 141, in state >> >> service=name, >> >> RuntimeError: Failed to start service 'ovirt-imageio-daemon' >> >> 2017-11-06 02:56:47,553-0500 ERROR otopi.context >> >> context._executeMethod:152 Failed to execute stage 'Closing up': >> Failed to >> >> start service 'ovirt-imageio-daemon' >> >> In /var/log/messages of the host [1], there is: >> >> Nov 6 02:56:47 lago-basic-suite-master-host-0 systemd: Starting oVirt >> ImageIO Daemon... >> Nov 6 02:56:47 lago-basic-suite-master-host-0 python: detected >> unhandled Python exception in '/usr/bin/ovirt-imageio-daemon' >> Nov 6 02:56:47 lago-basic-suite-master-host-0 python: can't >> communicate with ABRT daemon, is it running? [Errno 2] No such file or >> directory >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> Traceback (most recent call last): >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/bin/ovirt-imageio-daemon", line 14, in <module> >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> server.main(sys.argv) >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", >> line 57, in main >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> start(config) >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/lib/python2.7/site-packages/ovirt_imageio_daemon/server.py", >> line 85, in start >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> WSGIRequestHandler) >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__ >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> self.server_bind() >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/lib64/python2.7/wsgiref/simple_server.py", line 48, in >> server_bind >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> HTTPServer.server_bind(self) >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/lib64/python2.7/BaseHTTPServer.py", line 108, in >> server_bind >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> SocketServer.TCPServer.server_bind(self) >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> self.socket.bind(self.server_address) >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> File "/usr/lib64/python2.7/socket.py", line 224, in meth >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> return getattr(self._sock,name)(*args) >> Nov 6 02:56:47 lago-basic-suite-master-host-0 ovirt-imageio-daemon: >> socket.error: [Errno 98] Address already in use >> Nov 6 02:56:47 lago-basic-suite-master-host-0 systemd: >> ovirt-imageio-daemon.service: main process exited, code=exited, >> status=1/FAILURE >> >> ovirt-host-deploy stops it, and immediately tries to start it: >> >> 2017-11-06 02:56:47,203-0500 DEBUG >> otopi.plugins.otopi.services.systemd plugin.executeRaw:863 >> execute-result: ('/usr/bin/systemctl', 'stop', >> 'ovirt-imageio-daemon.service'), rc=0 >> ... >> 2017-11-06 02:56:47,550-0500 DEBUG >> otopi.plugins.otopi.services.systemd plugin.executeRaw:863 >> execute-result: ('/usr/bin/systemctl', 'start', >> 'ovirt-imageio-daemon.service'), rc=1 >> >> Also, imageio-daemon's log [2] looks a bit weird to me - it has 5 >> 'Starting' lines, but no >> other lines I would have expected to have, reading its source, and as >> I can see in another >> run, that did finish successfully [3]. >> >> Adding Idan, but not sure it's a bug in the daemon. >> >> [1] >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ >> >> [2] >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3626/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log >> >> [3] >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/3628/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/ovirt-imageio-daemon/daemon.log > > > Looks like the daemon is already running on this host - maybe host deploy > is trying to start the service twice? > > We did not change the startup code couple of years, so this must be some > change in another component. > > This patch will make it easier to detect future issues, logging any error > to the daemon log during startup: > https://gerrit.ovirt.org/83670/ > > Nir > > >> >> >> >> >> >> </error> >> >> >> >> >> > >> > >> >> >> >> -- >> Didi >> _______________________________________________ >> Devel mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/devel >> > >
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
