Re: [Mongrel] Why not ignore stale PID files?

Hongli Lai Tue, 10 Jun 2008 16:26:09 -0700

Zed A. Shaw wrote:
> That would be the ideal situation, but Ruby doesn't have good enough
> process management APIs to do this portably.


Erik Hetzner:
> ... but not the edge case where a process is running, with
> the same owner, but is no longer a mongrel process.

I feel obligated to reply. :) PID files suck. I think it's really stupidthat modern operating systems don't provide some kind of mechanism toautomatically delete a file when a process exits (even when it exitsabnormally).

Anyway, I've written a fair share of daemons in the past. What I tend todo is to combine PID files with a number of lock files:

- foo.pid. This is obviously the PID file.

- foo.lock. This is a lock file whose lock is acquired during the lifetime of the daemon. If the daemon exits, whether normally or abnormally,the lock on that file is released. To check whether foo.pid is stale, wesimply check whether foo.lock is locked.

The only way to check whether foo.lock is locked, is to lock it with thenon-blocking parameter. If locking fails then it means it's alreadylocked, meaning that the PID file is not stale. However, this couldresult in a racing condition. Suppose that you are starting a daemon,while simultaneously checking whether the daemon is already started:1. The checker acquires a non-blocking lock on foo.lock. This succeeds,so it knows that the PID file is stale. It prints "stale PID filedetected" on screen, and is about to release the lock on foo.lock.2. All of a sudden, before the lock is released, a context switchoccurs. The daemon that is being started tries to acquire a lock onfoo.lock. This fails because the checker still has the lock, so thedaemon thinks that there's already a daemon running, and exits.


So we need another lock file to serialize all PID file related actions:
- foo.global.lock

So the code for checking whether the daemon's running is something likethis:

  def check():
     lock(foo.global.lock)
     if try_lock(foo.lock):
        # Locking succeeded, so we have a stale PID file here.
        unlock(foo.lock)
        unlock(foo.global.lock)
        return nil
     else:
        # Locking failed. Process is still running.

pid = read_pid_file(foo.pid) # Of course, your code shouldalso check whether the PID file actually exist.

        unlock(foo.global.lock)
        return pid

Daemon code:
  lock(foo.global.lock)
  write_pid_file(foo.pid)
  lock(foo.lock)
  unlock(foo.global.lock)

  main_loop()

  lock(foo.global.lock)
  delete_file(foo.pid)
  unlock(foo.lock)
  unlock(foo.global.lock)

NOTE: lock() creates the lock file if it doesn't already exist.

This works great, even on Windows. The only gotchas are:

- flock() doesn't work over NFS. You'll have to use some kind of fcntl()call to lock files over NFS, but I'm not sure whether Ruby provides anAPI for that.- foo.global.lock is never deleted. You cannot safely delete it withoutcreating some kind of racing condition.

_______________________________________________
Mongrel-users mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/mongrel-users

Re: [Mongrel] Why not ignore stale PID files?

Reply via email to