> Given that Cyrus is unix processes, not threads, what mechanism are > you proposing for the atomic here?
__atomic_load/store and __atomic_fetch_add/sub: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html. These are not related to threads or processes, these are compiler built-ins that leverage processor-specific instructions like CAS (compare-and-swap). The compiler knows which instructions to use for each arch. More info here: https://lwn.net/Articles/509102/. Clang/LLVM also has similar built-ins: https://libcxx.llvm.org/atomic_design_a.html. So instead of using some old POSIX synchro primitives, we go one level lower and implement them ourselves in opportunistic lock-free way with compiler built-ins. The built-ins are basically inline asm implementation abstractions available for all archs supported by the compiler. > I'd be keen to see something shaped like a pull request against Cyrus > showing how this would interact with the existing locking > architecture. The issue is I don't know the internals of cyrus enough to be able to identify all the places where write operations occur and I don't have enough understanding of the entire sync/write logic to be able to provide a working solution. But once the write operations are encapsulated in separate functions and there's a list of all of them, I could implement the efficient global locking and the backup tool that would leverage it. Regards, Anatoli On 13/11/19 02:25, Bron Gondwana wrote: > Given that Cyrus is unix processes, not threads, what mechanism are you > proposing for the atomic here? > > I'd be keen to see something shaped like a pull request against Cyrus > showing how this would interact with the existing locking architecture. > > Cheers, > > Bron. > > On Wed, Nov 13, 2019, at 16:12, Anatoli via Cyrus-devel wrote: >> > How do you know when the current write operations are finished? >> >> The pseudocode from my previous email: >> >> __start: >> >> atomic_inc(write_operations_in_execution_counter) >> >> atomic_load(write_barrier) >> >> if (write_barrier == ON) { >> >> atomic_dec(write_operations_in_execution_counter) >> sleep(100ms) >> goto __start >> >> } else { >> >> *perform_write_operation_with_its_own_locks()* >> atomic_dec(write_operations_in_execution_counter) >> >> } >> >> >> Here perform_write_operation_with_its_own_locks() is a function (one >> among many) that today performs a write operation (the algorithm implies >> that all occurrences of write operations should be wrapped with this >> pseudocode). >> >> Once a write function returns (which would mean that this particular >> write operation completed), the write_operations_in_execution_counter is >> decremented. When all operations complete, the counter would be == 0, >> which is detected by the barrier thread and it proceeds with the >> sync/fsync to have all files fully written to disk & notifies the >> external process waiting to take a snapshot. Then it turns off the >> barrier and the daemon continues normal operation. >> >> This is similar to a single global lock as you describe it, and it sort >> of behaves like it, but it doesn't need to be acquired as such every >> time and it has practically 0 overhead so it could be placed inside >> high-frequency loops. It's similar to how modern lock-free concurrency >> is implemented. The point it that it's an opportunistic locking which >> could be implemented with very low overhead. >> >> And if the code inside the perform_write_operation_with_its_own_locks() >> is guaranteed not to block on other >> perform_write_operation_with_its_own_locks() functions, then this >> implementation would be lock-free for when the barrier is OFF, i.e. no >> potential deadlocks. For when it's ON, there also would be no deadlocks, >> but the operation as such won't be fully lock-free. >> >> Also, the way I propose to implement the barrier thread, it won't block >> the server for more than a configurable amount of seconds no matter what >> (with this, the entire implementation would be lock-free (if we consider >> the snapshot as part of the process), i.e. it guarantees progress in a >> finite amount of time, though some threads could starve): >> >> atomic_store(write_barrier, ON) >> >> __check_write_counter: >> >> atomic_load(write_operations_in_execution_counter) >> >> if (write_operations_in_execution_counter == 0) { >> >> sync_data_to_disk() >> signal_data_ready() >> *wait_for_lock_release_with_timeout(5s)* >> atomic_store(write_barrier, OFF) >> >> } else { >> >> sleep(1ms) >> goto __check_write_counter >> >> } >> >> >> Here the wait_for_lock_release_with_timeout(5s) function will wait for >> the release-lock signal for 5 seconds and would turn off the barrier no >> matter if the external operation (snapshot-taking backup tool) >> completed, so the server would continue its normal operation once the 5s >> timeout expires. >> >> So while the barrier thread waits for the release-lock signal, the >> backup tool performs a snapshot and then sends the release-lock signal. >> The result of the signaling indicates whether the lock was released >> before or not. If the backup tool receives the code indicating that the >> lock was released before, it would mean that the snapshot that was taken >> could be inconsistent. >> >> In this case the backup tool could try to perform the operation again or >> proceed in another way (e.g. to notify the admin that the snapshot takes >> more than the preconfigured lock-wait time). Again this is the >> opportunistic locking, i.e. we try to perform an operation without a >> guarantee of success, so we don't need to wait indefinitely, again >> providing a lock-free guarantee. If we succeed, then all is good. If >> not, we try again or abandon the task with an error. >> >> And all this would be external to cyrus, it would be implemented in the >> backup utility. >> >> I guess the best way to start with this is to identify all places where >> data write operations occur (I suppose this is where the mail data and >> all sorts of databases are written). Once they are identified they could >> be tweaked a bit for better concurrency and lockability and then we >> could analyze how to wrap them with a global lock/barrier. >> >> Regards, >> Anatoli >> >> >> On 12/11/19 06:20, Bron Gondwana wrote: >> > >> > >> > On Tue, Nov 12, 2019, at 14:50, Anatoli wrote: >> >> Bron, >> >> >> >> The proposed algo is a barrier before any single-lock. In itself it's a >> >> single lock, but the same code (the pseudocode for the *worker thread* >> >> in my previous mail) should be inserted at *every* single-lock/write >> >> operation location. If there's no need to pause, the overhead is >> >> non-existent. If a pause is requested, all worker threads would >> pause at >> >> the entrance to any single-lock/write code. >> >> >> >> It would make the entire Cyrus daemon to complete all pending write >> >> operations and pause new ones. At this stage, if I understand it >> >> correctly, the data on disk would be in a consistent state, ready to >> >> take a snapshot or to perform some other operation. >> > >> > "complete all pending write operations and pause new ones" >> > >> > How do you know when the current write operations are finished? >> > >> >> Without that, if we just take a snapshot of the fs, it could happen >> that >> >> a) some files are not written entirely (i.e. caught in the middle of a >> >> write operation) or b) the contents of some files are newer than the >> >> other, i.e. the logical write operation was not atomic (e.g. mail data >> >> is written but indexes are not updated yet or something similar). >> >> >> >> Maybe I didn't understand you correctly. Do you mean that finishing all >> >> writes and pausing new ones is not enough to guarantee an integral >> state >> >> of files on disk? If it's the case, what would have to be done to >> >> guarantee it (i.e. to make it like Cyrus was shutdown normally)? >> > >> > I mean that to finish all writes and pause new ones, you need to know >> > that the writes are finished. And not just writes, but sets of writes >> > that are held under a lock together. The way I know to do this is a >> > single global lock with the following properties: >> > >> > 1) every action which locks any file within Cyrus for writing takes a >> > SHARED global lock before it takes the write lock on the file. >> > >> > 2) the SHARED lock is held for the duration of the writes, and released >> > once the writes are finished. >> > >> > 3) the "backup utility" takes an EXCLUSIVE lock on the global lock, >> > which will only be granted once each write is finished. It then takes a >> > snapshot, and releases the EXCLUSIVE lock. >> > >> > This guarantees full consistency. >> > >> > The question that always exists for locks is "what granularity" - too >> > wide, and you hold the lock for a long time. Too narrow, and you take >> > and release it very frequently, adding overhead. >> > >> > My first and most dumbest theory is to go quite wide - add the lock in >> > every runloop and command line utility such that it's held for the >> > entire running of the loop or the utility! Mostly these are done within >> > a fraction of a second. The one place that might be interesting is >> > FETCH 1:* RFC822.PEEK or similar in imapd, where we already have some >> > locking magic that holds a shared namelock on the mailbox to stop >> > repacking while it releases the index lock to allow other actions on the >> > mailbox in the meanwhile. >> > >> > So we could go down a layer and only lock when we lock mailboxes or >> > cyrusdbs, and refcount the global lock. This seems more likely to be a >> > good layer, and not too horrible. >> > >> > The other thing is that we'll need to assert that the lock isn't being >> > held during each daemon's command loop, so that bugs don't leak out to >> > deadlock entire servers. >> > >> > And I think that's nearly it :) >> > >> > Bron. >> > >> > -- >> > Bron Gondwana, CEO, Fastmail Pty Ltd >> > br...@fastmailteam.com <mailto:br...@fastmailteam.com> >> > >> > >> > > -- > Bron Gondwana, CEO, Fastmail Pty Ltd > br...@fastmailteam.com <mailto:br...@fastmailteam.com> > >