-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 At some point hitherto, Benjamin Scott hath spake thusly: > A Berkeley/mbox format mail store is basically a flat text file.
Being dissatisfied with the performance of UW IMAP, I've been thinking a lot about this problem. I've got some ideas about how to make some (potentially drastic, I think) performance improvements. > Messages are appended one after another. There is no index. To > read message number #368, you have to read and parse messages 1 > through 367, too. And you have to do that every time you open the > mail store. This doesn't necessarily need to be true, strictly speaking. You certainly do need to read the whole file to parse all the messages, at least once, but you don't necessarily need to do it every time you open the mail store. For instance, the IMAP server can index the file and cache the results. It then only needs to update the cache when it detects the mail store has changed. It can retain this cache on disk (perhaps only writing the cache out to disk when it closes a store), with information that will allow it to determine whether the file has changed since the cache was created, to improve performance. On the other hand, I'm not really sure what sort of performance impact there'd be if the particluar mail store was very high volume... > This gets really ugly if you have a large number of messages, or > very large messages (think file attachments). To compound the > problem, many IMAP clients disconnect and reconnect very frequently. Under the above scenario, the penalty for this can be minimized... > When it comes to deletes, the process of moving the mailbox to /tmp > is stupid, but the rest of it is unavoidable. If you delete a > message from a an mbox mail store, you have to rewrite the whole > file, just like you do for any other text file. This also is not strictly true. You can rewrite the file in place, and then you need only re-write messages that come after deleted messages. This has some interesting implications. As most old messages in a mailbox tend to be messages that the mail user wants to keep around, most messages which get deleted in a given session tend to be towards the end of the mail store. This means that deleting messages from the store can have much less of a penalty than they often do in some implementations. There are two ways (that I can think of) to do this re-writing. The technique for both is basically the same, though the mechanism is slightly different. Also, note that this only needs to be done when the client performs an EXPUNGE. Depending on what protocol you're using, you might be able to get away with simply marking a deleted message as such in memory, without modifying the physical mail store at all, until the mailbox is expunged or closed, depending on the protocol. The technique is to create a buffer which is the size of the message being deleted. Then move messages which are not being deleted down in the file in chunks of the size of the buffer. As you encounter more deleted messages, you can increase the size of the buffer, so that more of the mail store is re-written at once. Obviously you need to make adjustments for when the next chunk is less than the size of your buffer, or when you're crossing the boundary of a message that is being deleted... The two mechanisms to do this would be MMIO, or use the appropriate combinations of fseek, fread, and fwrite. The first method would likely be faster on machines with sufficient memory, but might suffer from swapping problems on machines with little memory. The second method is less prone to paging. Deciding which to use would likely involve experimentation, or providing both and using run-time information to determine which is more appropriate to the situation. Now, whether or not these methods improve performance over what UW IMAP does, I can't say. Though I suspect that they can't help but do so.... ;-) Also, the second method basically guarantees the server would use both less disk space and less memory. I'm not so sure about MMIO (seems like it would be a memory hog, but I don't really know that much about MMIO, unfortunately). I keep meaning to write a library to implement this, in order to compare, but I never seem to get around to it. Ultimately, I would want to use these in an implementation of an IMAP server with which I might replace UW IMAP, but that means I'd need to implement the IMAP protocol, and after looking through some of the RFC's, that really just doesn't look like fun. ;-) > Basically, mbox sucks. :-) Sure, but the alternatives do too. They just suck differently. They do have significant benefits over mbox. They generally don't suffer from locking issues, as there's basically no need to lock at all with message-per-file mailbox formats. They also make writes (e.g. deletes) much faster. The trick is initially opening the mail store. Large Maildir mailboxes typically take an order of magnitude longer to open than mbox files, on account of the fact that the program must go through a series of cycles: open -> read -> close -> (filesystem seek) -> open -> read -> close This takes MUCH longer than mbox's open -> read -> read -> read -> read -> close Since filesystems on Unix systems tend to minimize fragmentation by design, the likelihood is that you will not need to seek much when reading a large mbox file. You also do not need to do repeated open calls, which have substantial overhead. Opening a mailstore on mbox, for these reasons, is almost invariably *much* faster. And personally, that's the part I like waiting for the least. Filesystem design can certainly impact HOW MUCH mbox wins... But (using the mutt e-mail client as a bench mark) various members of the mutt-users mailing list tested local (identical) mailboxes using both mbox and maildir, on ext[23], reiserfs, ufs, and FreeBSD's super duper enhanced FS with patches to improve performance for filesystems with lots of small files (which did the best with maildir out of all of them), but for this operation, mbox won IN ALL CASES. - -- Derek Martin [EMAIL PROTECTED] - --------------------------------------------- I prefer mail encrypted with PGP/GPG! GnuPG Key ID: 0x81CFE75D Retrieve my public key at http://pgp.mit.edu Learn more about it at http://www.gnupg.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE8rng5djdlQoHP510RAn4uAKCMh31ZIkSAgxGX0JNQBfmYf3VNEQCfczIi X9fal1PdArP7CwKMJIRgvkE= =zO53 -----END PGP SIGNATURE----- ***************************************************************** To unsubscribe from this list, send mail to [EMAIL PROTECTED] with the text 'unsubscribe gnhlug' in the message body. *****************************************************************
