On Tue, Nov 12, 2019, at 14:50, Anatoli wrote:
> Bron,
> 
> The proposed algo is a barrier before any single-lock. In itself it's a
> single lock, but the same code (the pseudocode for the *worker thread*
> in my previous mail) should be inserted at *every* single-lock/write
> operation location. If there's no need to pause, the overhead is
> non-existent. If a pause is requested, all worker threads would pause at
> the entrance to any single-lock/write code.
> 
> It would make the entire Cyrus daemon to complete all pending write
> operations and pause new ones. At this stage, if I understand it
> correctly, the data on disk would be in a consistent state, ready to
> take a snapshot or to perform some other operation.

"complete all pending write operations and pause new ones"

How do you know when the current write operations are finished?

> Without that, if we just take a snapshot of the fs, it could happen that
> a) some files are not written entirely (i.e. caught in the middle of a
> write operation) or b) the contents of some files are newer than the
> other, i.e. the logical write operation was not atomic (e.g. mail data
> is written but indexes are not updated yet or something similar).
> 
> Maybe I didn't understand you correctly. Do you mean that finishing all
> writes and pausing new ones is not enough to guarantee an integral state
> of files on disk? If it's the case, what would have to be done to
> guarantee it (i.e. to make it like Cyrus was shutdown normally)?

I mean that to finish all writes and pause new ones, you need to know that the 
writes are finished. And not just writes, but sets of writes that are held 
under a lock together. The way I know to do this is a single global lock with 
the following properties:

1) every action which locks any file within Cyrus for writing takes a SHARED 
global lock before it takes the write lock on the file.

2) the SHARED lock is held for the duration of the writes, and released once 
the writes are finished.

3) the "backup utility" takes an EXCLUSIVE lock on the global lock, which will 
only be granted once each write is finished. It then takes a snapshot, and 
releases the EXCLUSIVE lock.

This guarantees full consistency.

The question that always exists for locks is "what granularity" - too wide, and 
you hold the lock for a long time. Too narrow, and you take and release it very 
frequently, adding overhead.

My first and most dumbest theory is to go quite wide - add the lock in every 
runloop and command line utility such that it's held for the entire running of 
the loop or the utility! Mostly these are done within a fraction of a second. 
The one place that might be interesting is FETCH 1:* RFC822.PEEK or similar in 
imapd, where we already have some locking magic that holds a shared namelock on 
the mailbox to stop repacking while it releases the index lock to allow other 
actions on the mailbox in the meanwhile.

So we could go down a layer and only lock when we lock mailboxes or cyrusdbs, 
and refcount the global lock. This seems more likely to be a good layer, and 
not too horrible.

The other thing is that we'll need to assert that the lock isn't being held 
during each daemon's command loop, so that bugs don't leak out to deadlock 
entire servers.

And I think that's nearly it :)

Bron.

--
 Bron Gondwana, CEO, Fastmail Pty Ltd
 br...@fastmailteam.com

Reply via email to