On Mon, Nov 27, 2017 at 10:38 AM, Yedidyah Bar David <[email protected]> wrote:
> On Sun, Nov 26, 2017 at 7:24 PM, Nir Soffer <[email protected]> wrote: > > I think we need to check and report which process is listening on a port > > when starting a server on that port fail. > > How do you know that a server was "started on that port", and that > if failed specifically because it failed to bind? > > There is no standardized (Unix) way to mark that a service wants to > listen on a specific port, or that it failed because a specific port > was bound by some other process. > > There are various classical *inetd* daemons, and modern systemd.socket, > that listen *instead* of some service. Then they can manage the port > resources and perhaps do something intelligent about them. > > > > > Didi, do you think we can integrate this in the deploy code, or this > > should be implemented in each server? > > It should be quite easy to patch otopi's services.state to run something > if start fails, e.g. 'ss -anp' or whatever you want. > > It should even be not-too-hard to do this in a self-contained plugin, > so can be part of otopi-debug-plugins. > > If we decide that something needs to be implemented by each server, > perhaps "something" should be to be controlled by a systemd.socket unit. > Didn't try, though, to see what this actually buys us. > > > > > Maybe when deployment fails, the deploy code can report all the > > listening sockets and the processes bound to these sockets? > > Pushed now: > > https://gerrit.ovirt.org/84699 core: Name TRANSACTION_INIT > https://gerrit.ovirt.org/84700 plugins: debug: Add debug_failure > https://gerrit.ovirt.org/84701 automation: Test failure > > Will merge soon, if all goes well. > Merged them. Pushed to OST: https://gerrit.ovirt.org/84710 Dafna - thanks for opening the bug on ovirt-imageio, but I am not sure anyone can do much about it without more info, such as might be provided by above patches. When I suggested below to open BZ I meant on otopi or host-deploy to provide more debugging info, not for imageio - obviously no harm in opening it, and it's good to have it even if only for reference. > > Feel free to open BZ for other things discussed above, if relevant. > > > > > Nir > > > > On Sun, Nov 26, 2017 at 7:11 PM Gal Ben Haim <[email protected]> > wrote: > >> > >> The failure is not consistent. > >> > >> On Sun, Nov 26, 2017 at 5:33 PM, Yaniv Kaul <[email protected]> wrote: > >>> > >>> > >>> > >>> On Sun, Nov 26, 2017 at 4:53 PM, Gal Ben Haim <[email protected]> > >>> wrote: > >>>> > >>>> We still see this issue on the upgrade suite from latest release to > >>>> master [1]. > >>>> I don't see any evidence in "/var/log/messages" [2] that > >>>> "ovirt-imageio-proxy" was started twice. > >>> > >>> > >>> Since it's not a registered port and a high port, could it be used by > >>> something else (what are the odds though ? > >>> Is it consistent? > >>> Y. > >>> > >>>> > >>>> > >>>> [1] > >>>> http://jenkins.ovirt.org/blue/rest/organizations/jenkins/ > pipelines/ovirt-master_change-queue-tester/runs/4153/nodes/ > 123/steps/241/log/?start=0 > >>>> > >>>> [2] > >>>> http://jenkins.ovirt.org/view/Change%20queue%20jobs/job/ > ovirt-master_change-queue-tester/4153/artifact/exported- > artifacts/upgrade-from-release-suit-master-el7/test_ > logs/upgrade-from-release-suite-master/post-001_initialize_engine.py/lago- > upgrade-from-release-suite-master-engine/_var_log/messages/*view*/ > >>>> > >>>> On Fri, Nov 24, 2017 at 8:16 PM, Dafna Ron <[email protected]> wrote: > >>>>> > >>>>> there were two different patches reported as failing cq today with > the > >>>>> ovirt-imageio-proxy service failing to start. > >>>>> > >>>>> Here is the latest failure: > >>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4130/artifact > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On 11/23/2017 03:39 PM, Allon Mureinik wrote: > >>>>> > >>>>> Daniel/Nir? > >>>>> > >>>>> On Thu, Nov 23, 2017 at 5:29 PM, Dafna Ron <[email protected]> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> We have a failing on test > >>>>>> 001_initialize_engine.test_initialize_engine. > >>>>>> > >>>>>> This is failing with error Failed to start service > >>>>>> 'ovirt-imageio-proxy > >>>>>> > >>>>>> > >>>>>> Link and headline ofto suspected patches: > >>>>>> > >>>>>> build: Make resulting RPMs architecture-specific - > >>>>>> https://gerrit.ovirt.org/#/c/84534/ > >>>>>> > >>>>>> > >>>>>> Link to Job: > >>>>>> > >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/4055 > >>>>>> > >>>>>> > >>>>>> Link to all logs: > >>>>>> > >>>>>> > >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4055/artifact/ > >>>>>> > >>>>>> > >>>>>> http://jenkins.ovirt.org/job/ovirt-master_change-queue- > tester/4055/artifact/exported-artifacts/upgrade-from- > release-suit-master-el7/test_logs/upgrade-from-release- > suite-master/post-001_initialize_engine.py/lago- > upgrade-from-release-suite-master-engine/_var_log/messages/*view*/ > >>>>>> > >>>>>> > >>>>>> (Relevant) error snippet from the log: > >>>>>> > >>>>>> <error> > >>>>>> > >>>>>> > >>>>>> from lago log: > >>>>>> > >>>>>> Failed to start service 'ovirt-imageio-proxy > >>>>>> > >>>>>> messages logs: > >>>>>> > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Starting Session 8 of user root. > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: Traceback (most recent call last): > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/bin/ovirt-imageio-proxy", line 85, > in > >>>>>> <module> > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config) > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File > >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py", > line > >>>>>> 21, in main > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: image_server.start(config) > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File > >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py", > line 45, > >>>>>> in start > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: WSGIRequestHandler) > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", > line 419, > >>>>>> in __init__ > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: self.server_bind() > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/wsgiref/ > simple_server.py", > >>>>>> line 48, in server_bind > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self) > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/BaseHTTPServer.py", > line > >>>>>> 108, in server_bind > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self) > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", > line 430, > >>>>>> in server_bind > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address) > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/socket.py", line > 224, in > >>>>>> meth > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args) > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in use > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> ovirt-imageio-proxy.service: main process exited, code=exited, > >>>>>> status=1/FAILURE > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Failed to start oVirt ImageIO Proxy. > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Unit ovirt-imageio-proxy.service entered failed state. > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> ovirt-imageio-proxy.service failed. > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling restart. > >>>>>> Nov 23 07:30:47 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Starting oVirt ImageIO Proxy... > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: Traceback (most recent call last): > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/bin/ovirt-imageio-proxy", line 85, > in > >>>>>> <module> > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: status = image_proxy.main(args, config) > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File > >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/image_proxy.py", > line > >>>>>> 21, in main > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: image_server.start(config) > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File > >>>>>> "/usr/lib/python2.7/site-packages/ovirt_imageio_proxy/server.py", > line 45, > >>>>>> in start > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: WSGIRequestHandler) > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", > line 419, > >>>>>> in __init__ > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: self.server_bind() > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/wsgiref/ > simple_server.py", > >>>>>> line 48, in server_bind > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: HTTPServer.server_bind(self) > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/BaseHTTPServer.py", > line > >>>>>> 108, in server_bind > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: SocketServer.TCPServer.server_bind(self) > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/SocketServer.py", > line 430, > >>>>>> in server_bind > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: self.socket.bind(self.server_address) > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: File "/usr/lib64/python2.7/socket.py", line > 224, in > >>>>>> meth > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: return getattr(self._sock,name)(*args) > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > >>>>>> ovirt-imageio-proxy: socket.error: [Errno 98] Address already in use > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> ovirt-imageio-proxy.service: main process exited, code=exited, > >>>>>> status=1/FAILURE > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Failed to start oVirt ImageIO Proxy. > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Unit ovirt-imageio-proxy.service entered failed state. > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> ovirt-imageio-proxy.service failed. > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> ovirt-imageio-proxy.service holdoff time over, scheduling restart. > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> start request repeated too quickly for ovirt-imageio-proxy.service > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Failed to start oVirt ImageIO Proxy. > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> Unit ovirt-imageio-proxy.service entered failed state. > >>>>>> Nov 23 07:30:48 lago-upgrade-from-release-suite-master-engine > systemd: > >>>>>> ovirt-imageio-proxy.service failed. > >>>>>> > >>>>>> </error> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Infra mailing list > >>>>>> [email protected] > >>>>>> http://lists.ovirt.org/mailman/listinfo/infra > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Devel mailing list > >>>>> [email protected] > >>>>> http://lists.ovirt.org/mailman/listinfo/devel > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> GAL bEN HAIM > >>>> RHV DEVOPS > >>>> > >>>> _______________________________________________ > >>>> Devel mailing list > >>>> [email protected] > >>>> http://lists.ovirt.org/mailman/listinfo/devel > >>> > >>> > >> > >> > >> > >> -- > >> GAL bEN HAIM > >> RHV DEVOPS > >> _______________________________________________ > >> Devel mailing list > >> [email protected] > >> http://lists.ovirt.org/mailman/listinfo/devel > > > > -- > Didi > -- Didi
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
