Abnormal shutdown leaves the pidfile, which prevents subsequent startup
-----------------------------------------------------------------------

                 Key: DAEMON-183
                 URL: https://issues.apache.org/jira/browse/DAEMON-183
             Project: Commons Daemon
          Issue Type: Bug
          Components: Procrun
    Affects Versions: 1.0.3
            Reporter: Steve Ash
            Priority: Trivial


This is really a trivial issue, so you may want to just close as a WONTFIX but 
it does represent an inconsistency that I don't feel I can release into 
production so I'm documenting it here.

When using the pidfile with procrun, if the pidfile isn't deleted then the next 
startup fails indicating that a Pid file exists.  Due to incorrectly 
configuring the service (my stopmode was not set, so my main thread never 
returned, causing it to timeout), I often always had the pidfile existing after 
the service came down.  This in and of itself seems like it may be an issue.

None the less on a subsequent startup, it failed indicating that a pidfile 
existed-- but then deleted the existing pidfile.  So a second attempt to start 
would successfully work.  It just felt a little strange that it would fail the 
first time, and then work the second time.  I don't really know if its wrong, 
but I know that my customers would feel this is fragile/weird.  Thus, I am just 
not using the pidfile.

So a few thoughts:

1) should the pidfile check go further and query for a running process with the 
expected image (servicename.exe) and process id?  and if it doesn't exist, 
assume this is an orphaned pidfile and delete it then continue startup
2) obviously if scm or an external user kills the process then you can't delete 
the file-- but the timeout that I experienced I think came from SCM not from 
the timeout in serviceStop (e.g. I don't think I had a "Worker was killed" 
message).  So are you aware of a problem with the timeout logic where the SCM 
will force the process down instead of waiting for procrun to timeout? 
3) today if the process aborts startup because the pidfile already exists, the 
gPidfileName global has already been set, and thus it deletes the pidfile (i.e. 
why the second attempt to start succeeds).  What happens if this pid file 
represents a real already running process?  Is the other process locking it-- 
and the delete would fail?  Or would it successfully delete the pidfile now 
allowing multiple concurrent instances to run?

Just a few minor things.  If you feel any of these things are worth 
implementing/changing, I would be happy to work on it and submit a patch.  If 
not, no worries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to