[Imap-uw] Re: Benchmarking imap, filesystems

Mark Crispin Wed, 28 Jun 2006 15:29:12 -0700

I agree with your surprise at traditional UNIX mailbox formatoutperforming mbx; since that defies both common sense and years ofempirical results. The best guess that I can make is that the resultswere badly skewed by the buffer cache.

You ought to take a look at the new mix format. I've been using it for myown mailboxes for a few weeks now, and a couple of my teammates are nowdoing so as well. Our initial performance testing shows that it is muchfaster than traditional UNIX or mbx format. It addresses the backupdifficulty problem with large mailboxes, and minimizes writes to messagedata.

One neat thing about mix format is that many problems are self-healing;there has never been a corrupted mix mailbox. For the past couple ofweeks, I've been debugging caching of thread/sort data (I just fixed a bugtoday, so be sure to use today's snapshot tarball). I've been doing thiswith live data -- my one and only INBOX -- without damage to the mailboxeven when highly experimental code crashed.

However, I need to emphasize that this is new hot code. It is not yet inproduction at UW, or even much load-testing, except for three programmers.You should do your own testing first.

Following is a draft overview of the mix format that I wrote for mybosses. Within the IMAP tarball, there is a more detailed technicaldescription.


Last update: 27 June 2006

This document is an overview of the mix format, as opposed to a
technical specification (that document is mixfmt.txt).  It is aimed at
personnel who want to understand more about the mix format without
going into the bits and bytes.

Mix has the following design goals:
 . greater robustness against corruption caused by hardware or
   software failures.  Many failures are "self-healing".
 . far fewer risky random-access I/O operations; a single false
   pointer calculation in other formats will corrupt the mailbox.
 . greater ease to repair damaged mailboxes.
 . (much) greater performance.
 . extensibility for new IMAP capabilities such as annotations,
   conditional store, or more aggressing caching.


1. General description of mix format and mix files

The mix format is a "dual-use" mailbox format; that is, a name may be
both a mailbox (containing messages) and a directory (containing other
mailboxes).  This is similar to newsgroups and formats such as mh or
maildir, but it quite unlike mbx or traditional UNIX formats which use
a single flat file containing all messages.

A mix mailbox is a directory with a set of files, all prefixed with
".mix".  The use of a leading "." is so that mix's data files are not
confused with other files (e.g., flat file mailboxes) in hierarchy
listings.  These files include:
 . mailbox metadata     - per-mailbox metadata, as defined in IMAP
 . message index        - per-message static data
 . message status       - per-message dynamic data
 . message data         - one or more files containing message texts
 . sort cache           - cache of sort/thread data

This separation of data was designed to minimize the writes to each
file (and thus the risk of corruption).  None of these files are
written unless data in the file changes.

Each file, and in the case of the message status file each message,
has a "modseq", which is a strictly ascending modification sequence
value.  The modseq is changed any time the file is updated.  When any
file is read, if its modseq is identical to the previous modseq the
file is immediately closed without reading any more of the file.


2. Reading mix files; opening mailbox

Opening the mailbox is a matter of reading the mailbox metadata,
message index, and message status files.  All of these files can be
read sequentially, and there is no need to read any of the message
data.  Depending upon the state of the sort cache, an initial sort or
thread can load (some, most, or all) the data it needs from the sort
cache without reading any message data.  This is the key to mix's fast
open.

Currently, obtaining IMAP ENVELOPE or BODYSTRUCTURE data must be done
by reading the message data.  But here again, mix helps by storing the
size of the header in the message index.  Locating the boundary
between message header and message body is a time-consuming effort in
all other mailbox formats.


3. Conditions under which mix files are updated

The metadata file is updated under the following conditions:
 . new mail is delivered (the UIDLAST value is incremented)
 . a new keyword is defined
 . the current message data file for new mail changes

The message index file is updated under the following conditions:
 . new mail is delivered
 . messages are expunged

The message status file is updated under the following conditions:
 . new mail is delivered (initial flags defined for a message)
 . flag change in at least one message

The message data file are updated under the following conditions:
 . new mail is delivered (append to current message data file only)
 . messages are expunged and mailbox can be locked exclusive access

The sort cache file is updated under the following conditions:
 . additional data for sorting is extracted


4. Splitting message data file

Mix splits message data into multiple message data files.  This is
done for the following reasons:
 . reduce the burden on the backup system - message data files which
   are not the current message data file are only altered if a
   contained message is expunged
 . reduce vulnerability to corruption -- fewer writes means less risk
 . avoid running up against the 2GB filesystem limit before having to
   go into longfiles.

Mix will finalize the current message data file and start a new one
under the following circumstances:
 . the sum of the sizes of all messages in a "copy messages from one
   mailbox to another mailbox" operation, plus the current message
   data file size, exceeds the predetermined threshold.
 . the sum of the size of the first message in an "append messages
   from string to mailbox", plus the current message data file size,
   exceeds the predetermined threshold.

It is infeasible to make append behave like copy, since it would
require that the entire append be buffered (which in turn would
require double the number of disk writes and free disk space).

The predetermined threshold is currently set at 1MB.


5. Work to be done

There is currently no mix mailbox repair tool.  One problem in
building such a tool is that no mailbox corruption has occurred yet;
or rather no mailbox corruption that didn't self-heal.

So, it isn't yet well-understood how mix fails.  I will be starting on
a mailbox repair tool based upon my guesses, but only hard data of
real failures will determine the ultimate direction of that tool.

Here is what is known about the impact of corruption and repair, in
increasing order of severity of harm done:

(a) A corrupted sort cache file can be deleted without no harm done
    other than a temporary loss of efficiency.  A new sort cache file
    will be automatically created and repopulated at the next sort or
    thread.

(b) A corrupted message status file can be replaced with a dummy
    status file.  The harm done will be the loss of flags on all
    messages.

(c) A corrupted mailbox metadata file can be replaced with a newly
    built metadata file with no keywords, new UIDVALIDITY, and UIDLAST
    taken from the last message in the index.  The harm done will be
    the lost of the UIDVALIDITY (meaning some clients will re-download
    all messages) and of defined keywords.

(d) A corrupted message index file can be rebuilt from the message
    data files.  The harm done will be the possible "unexpunge" of
    some expunged messages.

(e) A corrupted message data file can be repaired using techniques
    similar to repairing corrupted mbx files.  It is likely that the
    message index file will also need to be rebuilt.

Repair operations (a) and (b) are trivial.  (c) is also simple and can
be done by a sufficiently knowledgable person using emacs.  (d) is
more complex and requires a program, albeit with a well-defined
algorithm.  (e) is unknown other than guessing from mbx and tenex
repair tools.

For the first go-around, I plan to write a program which will have
operations (a) - (d) and perhaps also an interactive query mechanism.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.
_______________________________________________
Imap-uw mailing list
Imap-uw@u.washington.edu
https://mailman1.u.washington.edu/mailman/listinfo/imap-uw

[Imap-uw] Re: Benchmarking imap, filesystems

Reply via email to