Abnormal shutdown leaves the pidfile, which prevents subsequent startup
-----------------------------------------------------------------------
Key: DAEMON-183
URL: https://issues.apache.org/jira/browse/DAEMON-183
Project: Commons Daemon
Issue Type: Bug
Components: Procrun
Affects Versions: 1.0.3
Reporter: Steve Ash
Priority: Trivial
This is really a trivial issue, so you may want to just close as a WONTFIX but
it does represent an inconsistency that I don't feel I can release into
production so I'm documenting it here.
When using the pidfile with procrun, if the pidfile isn't deleted then the next
startup fails indicating that a Pid file exists. Due to incorrectly
configuring the service (my stopmode was not set, so my main thread never
returned, causing it to timeout), I often always had the pidfile existing after
the service came down. This in and of itself seems like it may be an issue.
None the less on a subsequent startup, it failed indicating that a pidfile
existed-- but then deleted the existing pidfile. So a second attempt to start
would successfully work. It just felt a little strange that it would fail the
first time, and then work the second time. I don't really know if its wrong,
but I know that my customers would feel this is fragile/weird. Thus, I am just
not using the pidfile.
So a few thoughts:
1) should the pidfile check go further and query for a running process with the
expected image (servicename.exe) and process id? and if it doesn't exist,
assume this is an orphaned pidfile and delete it then continue startup
2) obviously if scm or an external user kills the process then you can't delete
the file-- but the timeout that I experienced I think came from SCM not from
the timeout in serviceStop (e.g. I don't think I had a "Worker was killed"
message). So are you aware of a problem with the timeout logic where the SCM
will force the process down instead of waiting for procrun to timeout?
3) today if the process aborts startup because the pidfile already exists, the
gPidfileName global has already been set, and thus it deletes the pidfile (i.e.
why the second attempt to start succeeds). What happens if this pid file
represents a real already running process? Is the other process locking it--
and the delete would fail? Or would it successfully delete the pidfile now
allowing multiple concurrent instances to run?
Just a few minor things. If you feel any of these things are worth
implementing/changing, I would be happy to work on it and submit a patch. If
not, no worries.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.