On Sun, Nov 26, 2017 at 7:24 PM, Nir Soffer <[email protected]> wrote: > I think we need to check and report which process is listening on a port > when starting a server on that port fail.
How do you know that a server was "started on that port", and that if failed specifically because it failed to bind? There is no standardized (Unix) way to mark that a service wants to listen on a specific port, or that it failed because a specific port was bound by some other process. There are various classical *inetd* daemons, and modern systemd.socket, that listen *instead* of some service. Then they can manage the port resources and perhaps do something intelligent about them. > > Didi, do you think we can integrate this in the deploy code, or this > should be implemented in each server? It should be quite easy to patch otopi's services.state to run something if start fails, e.g. 'ss -anp' or whatever you want. It should even be not-too-hard to do this in a self-contained plugin, so can be part of otopi-debug-plugins. If we decide that something needs to be implemented by each server, perhaps "something" should be to be controlled by a systemd.socket unit. Didn't try, though, to see what this actually buys us. > > Maybe when deployment fails, the deploy code can report all the > listening sockets and the processes bound to these sockets? Pushed now: https://gerrit.ovirt.org/84699 core: Name TRANSACTION_INIT https://gerrit.ovirt.org/84700 plugins: debug: Add debug_failure https://gerrit.ovirt.org/84701 automation: Test failure Will merge soon, if all goes well. Feel free to open BZ for other things discussed above, if relevant. > > Nir > > On Sun, Nov 26, 2017 at 7:11 PM Gal Ben Haim <[email protected]> wrote: >> >> The failure is not consistent. >> >> On Sun, Nov 26, 2017 at 5:33 PM, Yaniv Kaul <[email protected]> wrote: >>> >>> >>> >>> On Sun, Nov 26, 2017 at 4:53 PM, Gal Ben Haim <[email protected]> >>> wrote: >>>> >>>> We still see this issue on the upgrade suite from latest release to >>>> master [1]. >>>> I don't see any evidence in "/var/log/messages" [2] that >>>> "ovirt-imageio-proxy" was started twice. >>> >>> >>> Since it's not a registered port and a high port, could it be used by >>> something else (what are the odds though ? >>> Is it consistent? >>> Y. >>> >>>> >>>> >>>> [1] >>>> http://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/ovirt-master_change-queue-tester/runs/4153/nodes/123/steps/241/log/?start=0 >>>> >>>> [2] >>>> http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ovirt-master_change-queue-tester/4153/artifact/exported-artifacts/upgrade-from-release-suit-master-el7/test_logs/upgrade-from-release-suite-master/post-001_initialize_engine.py/lago-upgrade-from-release-suite-master-engine/_var_log/messages/*view*/ >>>> >>>> On Fri, Nov 24, 2017 at 8:16 PM, Dafna Ron <[email protected]> wrote: >>>>> >>>>> there were two different patches reported as failing cq today with the >>>>> ovirt-imageio-proxy service failing to start. >>>>> >>>>> Here is the latest failure: >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4130/artifact >>>>> >>>>> >>>>> >>>>> >>>>> On 11/23/2017 03:39 PM, Allon Mureinik wrote: >>>>> >>>>> Daniel/Nir? >>>>> >>>>> On Thu, Nov 23, 2017 at 5:29 PM, Dafna Ron <[email protected]> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> We have a failing on test >>>>>> 001_initialize_engine.test_initialize_engine. >>>>>> >>>>>> This is failing with error Failed to start service >>>>>> 'ovirt-imageio-proxy >>>>>> >>>>>> >>>>>> Link and headline ofto suspected patches: >>>>>> >>>>>> build: Make resulting RPMs architecture-specific - >>>>>> https://gerrit.ovirt.org/#/c/84534/ >>>>>> >>>>>> >>>>>> Link to Job: >>>>>> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4055 >>>>>> >>>>>> >>>>>> Link to all logs: >>>>>> >>>>>> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4055/artifact/ >>>>>> >>>>>> >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4055/artifact/exported-artifacts/upgrade-from-release-suit-master-el7/test_logs/upgrade-from-release-suite-master/post-001_initialize_engine.py/lago-upgrade-from-release-suite-master-engine/_var_log/messages/*view*/ >>>>>> >>>>>> >>>>>> (Relevant) error snippet from the log: >>>>>> >>>>>> <error> >>>>>> >>>>>> >>>>>> from lago log: >>>>>> >>>>>> Failed to start service 'ovirt-imageio-proxy >>>>>> >>>>>> messages logs: >>>>>> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Starting Session 8 of user root. >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: Traceback (most recent call last): >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/bin/ovirt-imageio-proxy", line 85, in >>>>>> <module> >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config) >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py", >>>>>> line >>>>>> 21, in main >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: image_server.start(config) >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py", line >>>>>> 45, >>>>>> in start >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: WSGIRequestHandler) >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", line >>>>>> 419, >>>>>> in __init__ >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: self.server_bind() >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File >>>>>> "/usr/lib64/python2.7/wsgiref/simple_server.py", >>>>>> line 48, in server_bind >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self) >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/BaseHTTPServer.py", line >>>>>> 108, in server_bind >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self) >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", line >>>>>> 430, >>>>>> in server_bind >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address) >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/socket.py", line 224, in >>>>>> meth >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args) >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in use >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> ovirt-imageio-proxy.service: main process exited, code=exited, >>>>>> status=1/FAILURE >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Failed to start oVirt ImageIO Proxy. >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Unit ovirt-imageio-proxy.service entered failed state. >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> ovirt-imageio-proxy.service failed. >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling restart. >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Starting oVirt ImageIO Proxy... >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: Traceback (most recent call last): >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/bin/ovirt-imageio-proxy", line 85, in >>>>>> <module> >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config) >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py", >>>>>> line >>>>>> 21, in main >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: image_server.start(config) >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py", line >>>>>> 45, >>>>>> in start >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: WSGIRequestHandler) >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", line >>>>>> 419, >>>>>> in __init__ >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: self.server_bind() >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File >>>>>> "/usr/lib64/python2.7/wsgiref/simple_server.py", >>>>>> line 48, in server_bind >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self) >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/BaseHTTPServer.py", line >>>>>> 108, in server_bind >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self) >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", line >>>>>> 430, >>>>>> in server_bind >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address) >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/socket.py", line 224, in >>>>>> meth >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args) >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in use >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> ovirt-imageio-proxy.service: main process exited, code=exited, >>>>>> status=1/FAILURE >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Failed to start oVirt ImageIO Proxy. >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Unit ovirt-imageio-proxy.service entered failed state. >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> ovirt-imageio-proxy.service failed. >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling restart. >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> start request repeated too quickly for ovirt-imageio-proxy.service >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Failed to start oVirt ImageIO Proxy. >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> Unit ovirt-imageio-proxy.service entered failed state. >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine systemd: >>>>>> ovirt-imageio-proxy.service failed. >>>>>> >>>>>> </error> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Infra mailing list >>>>>> [email protected] >>>>>> http://lists.ovirt.org/mailman/listinfo/infra >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Devel mailing list >>>>> [email protected] >>>>> http://lists.ovirt.org/mailman/listinfo/devel >>>> >>>> >>>> >>>> >>>> -- >>>> GAL bEN HAIM >>>> RHV DEVOPS >>>> >>>> _______________________________________________ >>>> Devel mailing list >>>> [email protected] >>>> http://lists.ovirt.org/mailman/listinfo/devel >>> >>> >> >> >> >> -- >> GAL bEN HAIM >> RHV DEVOPS >> _______________________________________________ >> Devel mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/devel -- Didi _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
