On Tue, Oct 13, 2020, at 00:44, Ricardo Signes wrote:
>  * brong 
>    * locking! conversationsdb has locking problems, annotations are "a whole 
> locking nightmare"
>    * there's a lock inversion between JMAP and other calls in the locking of 
> convdb vs. mailboxes
>    * we can change how convdb locking works internally so you can 
> take/release a lock without closing the entire db, add a user lock, then 
> we're done!(?)
>    * …but it's a bunch of work, and that will be Bron's next project for Cyrus

Some more about this...

So... I just reverted my attempt from 
https://github.com/cyrusimap/cyrus-imapd/pull/3232 to fix the deadlock problems.

Here's the root problem:

mailbox.c:mailbox_open_advanced() takes a *shared NAMELOCK on the mailbox 
name*, then opens the index and first locks the user.conversations file before 
locking the indexes.

So far so good.

BUT: JMAP (and other things which need a long lock) takes a *(shared/exclusive) 
lock on the user.conversations file BEFORE it opens mailboxes*.

This leads to a lock inversion.

My attempted fix was to release the namelock while locking the conversations - 
which reverted everything to the order that the JMAP code does.  This caused 
errors because the mailbox could be deleted during that time that the lock is 
missing, and that broke invarients.

There are a couple of options:

*1) always lock conversations first*

Refactor mailbox handling to always lock the conversations.db BEFORE it tries 
to lock the mailbox name.

This means that we probably need to find a bunch of places in the code where 
we're doing copy and rename actions and lock one or both of the conversations 
databases early as well, so the lock is held across the entire action.

*2) create a new user-level lock for JMAP*

This would be useful for other things and would stop our current abuse of the 
conversations.db as a user-level lock.  Instead we could open and close it 
along with mailboxes and commit more frequently, which would give better 
transaction safety.  It does mean yet another lock, which would need to be 
managed in both mailbox_open_* and anywhere which is currently using the 
conversations database as a proxy lock.

...

There are pros and cons of both.  I'm leaning a bit towards the second one 
because it allows us to commit the conversations DB along with the mailbox in 
each case.

...

Here's the log information that led to knowing about this, using 
https://github.com/cyrusimap/cyrus-imapd/pull/3179 in a build:

2020-10-07T07:30:37.415533-04:00 fastmail1 slotf1t01/sync_client[4156099]: 
LOCKORDER: 
0=<E:6:/slot1/conf/lock/q/*S*10^37^129^209*brong@fastmaildev^com.lock>* 
1=<S:8:/slot1/conf/lock/domain/f/**fastmaildev.com/b/user/brong.lock**> 
2=<S:10:/slot1/conf/domain/f/**fastmaildev.com/user/b/brong.conversations**>*

2020-10-07T07:30:37.078907-04:00 fastmail1 slotf1t01/httpjmap[4156617]: 
LOCKORDER: 
*0=<S:11:/slot1/conf/domain/f/**fastmaildev.com/user/b/brong.conversations**> 
1=<S:13:/slot1/conf/lock/domain/f/**fastmaildev.com/b/user/brong.lock**>*

...

So the plan appears to be:

*1) do a new user-level lock* that is held when any mailbox is opened as well 
as held for the entire command by JMAP.  This will probably be a reuse of the 
existing user_namespacelock which is used for rename safety.

*2) make conversations lock inside mailboxes* - for efficiency this will be 
done by making it not locked for the entire time it's open, so making it more 
like every other cyrusdb which can be held open for efficiency and only locked 
when used.  Conversations will be lazyloaded inside mailbox if not already 
open, and it will just use refcounting like every other DB.

*3) BONUS: move per-mailbox annotations to only be a mailbox internal thing 
too* - and just like cache, lazyload it when needed.

I'm going to get started on that when I get a chance.

Bron.

--
  Bron Gondwana, CEO, Fastmail Pty Ltd
  br...@fastmailteam.com

Reply via email to