[lucy-dev] Improving Lucy's locking code

Nick Wellnhofer Wed, 15 Feb 2017 05:42:51 -0800

Lucifers,

I had a close look at Lucy's locking code and while I didn't find any glaringproblems, I think it could be improved in some places. First of all, I'd liketo use a locking mechanism provided by the OS instead of rolling our own. Themain upside is that we can detect reliably whether a process holding a lockhas crashed without relying on a heuristic. On UNIX there are:

fcntl: This is specified in POSIX and should exist on every modern UNIXplatform. The main problem is that fcntl identifies locks by a (pid, inode)pair. This means that you can only hold a single lock on a file in a process.If you close a lock file, any locks obtained via other file descriptors willbe relased, too. This is especially dangerous in multi-threaded applicationsand AFAICS can only be worked around with refcounted global locks protected bya mutex.


lockf: Typically just a wrapper around fcntl.

flock: This is a BSD interface that also works on Linux. Can be safely usedwith threads. Requires Linux 2.6.12 to work with NFS.

On Windows, exclusive (and mandatory) locks can be obtained when opening afile with CreateFile, shared locks with FileLockEx.

Regarding NFS, you can find lots of information online about certainimplementations not supporting certain locking APIs. But it seems to me thatthis is mostly anecdotal. My guess is that modern systems "just work" witheither fcntl or flock.

To get started, I'd propose to implement a locking backend based on flock thatis the default for local operation and optional on shared volumes.


Some other things I noted:

1. Lock_Is_Locked is in general prone to race conditions.

1.1. It is used to test the merge lock when indexing and pruning which is OKbecause the write lock is held. But it makes it hard to reason about the code.

1.2. Testing NFS snapshot read locks with ShLock_Is_Locked is inherently racy.If it turns out that shared locks based on flock or fcntl work reliably onsomewhat modern NFS setups, I'd love to remove Is_Locked and the currentSharedLock implementation completely.

2. The background merger first obtains the write lock, then the merge lock,then releases the write lock and acquires it again. With multiple backgroundmergers and no timeouts, this could lead to a classic deadlock. It isn't aproblem in practice because typically there's only a single merger and we havetimeouts. But again, it makes the code hard to reason about.

Here's a sketch of how I'd rework the code. The key idea is to separate thepurging of snapshots and dead merges. Purging of dead merges is error recoveryand should be done before modifying the index. Purging snapshots is routinecleanup that should be done only after modifying the index.


    Indexer() {
        if (Lock_Request(merge_lock)) {
            // No active background merger.
            Lock_Obtain(write_lock);
                // Purge dead merge if merge.json exists.
            Lock_Release(write_lock);
            Lock_Release(merge_lock);
        }

        Lock_Obtain(write_lock);
            // Test for and read merge.json without checking
            // the merge lock.
            // Add documents.
            // Commit.
            // Purge snapshots (but not dead merge).
        Lock_Release(write_lock);
    }

    BackgroundMerger() {
        Lock_Obtain(merge_lock);
            Lock_Obtain(write_lock);
                // Purge dead merge if merge.json exists.
                // Set up.
                // Write merge.json.
            Lock_Release(write_lock);
            // Merge.
            Lock_Obtain(write_lock);
                // Commit.
                // Remove merge.json.
                // Purge snapshots (but not dead merge).
            Lock_Release(write_lock);
        Lock_Release(merge_lock);
    }

Benefits:

- No need for Is_Locked.
- Consistent locking order with outer merge lock and
  inner write lock.
- Don't purge twice when indexing and background merging.

Nick

[lucy-dev] Improving Lucy's locking code

Reply via email to