Hello,
We are seeing an occasional problem where restarts of funcd on the
minions are not successful and the func daemon is stopped but not able
to start again.
Checking func.log gives:
2011-10-02 04:02:04,321 - INFO - Exception occured: socket.error
2011-10-02 04:02:04,321 - INFO - Exception value: (98, 'Address already
in use')
2011-10-02 04:02:04,322 - INFO - Exception Info:
File "/usr/bin/funcd", line 23, in ?
server.main(sys.argv)
File "/usr/lib/python2.4/site-packages/func/minion/server.py", line
413, in main
serve()
File "/usr/lib/python2.4/site-packages/func/minion/server.py", line
225, in serve
server = setup_server()
File "/usr/lib/python2.4/site-packages/func/minion/server.py", line
220, in setup_server
server = FuncSSLXMLRPCServer((listen_addr, listen_port),
config.module_list)
File "/usr/lib/python2.4/site-packages/func/minion/server.py", line
279, in __init__
self.ca)
File
"/usr/lib/python2.4/site-packages/func/minion/AuthedXMLRPCServer.py",
line 74, in __init__
SimpleXMLRPCServer.SimpleXMLRPCServer.__init__(self, address,
AuthedSimpleXMLRPCRequestHandler)
File "/usr/lib64/python2.4/SimpleXMLRPCServer.py", line 473, in __init__
SocketServer.TCPServer.__init__(self, addr, requestHandler)
File "/usr/lib64/python2.4/SocketServer.py", line 330, in __init__
self.server_bind()
File "/usr/lib64/python2.4/SocketServer.py", line 341, in server_bind
self.socket.bind(self.server_address)
File "<string>", line 1, in bind
As you may guess from the timestamp we are seeing this problem most
often at 4:02am on Sundays, i.e. as part of the logrotate of func logs.
Logging in to the server and starting the func service once we spot it
is stopped has always worked so far without needing manual removal of
any pid or lock file.
One theory is that this problem occurred when the func minion was
processing a command and told to restart part way through. From watching
netstat, it looks like the func daemon stops listening on the minion
port to allow the spawned process to communicate with the master. If the
daemon stops, the spawned process blocks a new daemon from starting
('Address already in use') but that spawned process then exits and we're
left with no daemons.
Does this ring any bells with anyone? Is this a known bug?
We've already added monit to mop up after this, but it'd be much
preferable to find a proper fix.
Alison
_______________________________________________
Func-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/func-list