I got paged at 3am last night due to a replication mess caused by a complex set 
of renames and possibly partial rename failures (I haven't read the logs to be 
sure).

The underlying issue was folders with duplicate UNIQUEIDs, so yeah - probably a 
failed rename.  Fine.

But the root cause here is that folder locations and renames are bogus:

a) filename paths contain folder names, which not only restricts valid names 
depending on the platform, but has all the shell quoting risks you might 
imagine.

b) renaming folders involves a bunch of IO - and we wound up removing the fast 
rename codepath because it's messy with sub folders and was never really safe.


The solution here is to store files on disk with the folder uniqueid and update 
the replication protocol to replicate folders by uniqueid.  It also needs 
mailboxes.db changes, but the $RACL$ work has already set up a nice way to do 
that.

The migration path is easy.  Existing paths stay the same.  If any folder is 
the source of a rename, it gets converted first and then the rename happens.  
All new folders get uniqueid paths.  Then you can just run a task that walks 
folders and runs exactly the way that rename does now - linking the files 
across, updating mailboxes.db to know that it's a uniqueid pathed folder, 
removing the old stuff.  Should be pretty easy.

We'll need to fix a bunch of tools of course.  And we'll want to store the name 
history in cyrus.header so reconstruct can still fix things.

OK, file naming.  My plan is this:

* screw 'domain/*', it's horrible.  usernames are user@domain. Filesystems cope 
fine with that.
* user paths are $spool/user/$username/$uniqueid
* shared paths are $spool/$toplevel/$uniqueid
  (yeah, so you can't have more than 32k folders in a single user or single 
toplevel, I think that's OK)

hashing - if enabled - is on the first letter of the second folder still, so my 
INBOX would be:

/var/spool/imap/b/br...@fastmail.fm/48902a4f-73c2-4e0f-ad4a-3e324fd33853/

Which does mean that you can't move folders between users without doing a copy. 
 I think that's OK, 99% vs 1% cases.  I would actually happily just reject 
those outright and make the user create a new folder with a new UNIQUEID, copy 
the data, delete the old one.

....

For later consideration - storing all the individual mail messages in a 
per-user GUID pool instead of by UID.  Same basic logic of not duplicating work 
or doing extra IO, but there's more to think about here.

Bron.

-- 
  Bron Gondwana
  br...@fastmail.fm

Reply via email to