On 18/02/2017 03:48, Peter Karman wrote:
We already write the PID to Lucy lock files, so we can check if the process
that created the lock is still running, yes?

Or is that the very heuristic that you're wanting to move away from?

Yes, because it's unreliable. We don't detect whether another unrelated process happens to reuse the PID. For example:

- We have an Indexer with PID 42.
- The machine crashes during indexing, leaving a lockfile with PID 42.
- The machine restarts and happens to assign PID 42 to another process
  before an Indexer runs.
- Any new Indexer will be locked out as long as this other process is
  running.

Right now, the only remedy is to manually delete the lock file. Fortunately, this scenario is unlikely if only the indexing process terminates abnormally, because PIDs won't get reused until they wrap around. Even if there's a system crash, there's a good chance that an Indexer is started before the old PID is reused.

On a shared volume like NFS, this problem is more pronounced. A single machine that goes down or loses its network connection in the wrong moment will block all other indexers until it gets back up and starts an indexing session.

Native locks are released by the operating system if a process crashes. This even works on NFS after a certain timeout with modern client and server implementations. Other than that, native shared locks should be faster on NFS.

Nick

Reply via email to