Hi, Mark! Guess this discussion is ancient, but here goes:
On Mon, 13 May 2002, Mark Crispin wrote: >On Mon, 13 May 2002 10:46:09 +0200 (CEST), Andreas Aardal Hanssen wrote: >> Are you saying that the standard UNIX mailbox format is safer and more >> consistent than, say for instance, Bernstein's Maildir format? If so, > >please explain because I don't immediately see this. The main problem >with maildir (leaving aside disk and inode usage issues) is that it is >harder to use UNIX tools with it and that it is not designed to This is IMHO a little too quick a conclusion. Standard UNIX tools are very easy to use with Maildir/, especially considering that no parsing is required to seperate emails, and that all flags can be stored in the file name of a message, rather than in the content. To find messages that contain whatever in the subject: find . -type f | xargs grep -liE 'subject:.*?hei' To delete all messages from Ole: find . -type f | xargs grep -liE 'return-path:.*?<ole>' | xargs rm -v These operations are much much harder with the UNIX spool system. And better yet: they're consistent, with no locking required. It should also be inarguable that alteration of single messages in the Maildir/ is more efficient than the UNIX spool system. Flag manipulation is just a simple "mv" or rename(), and content change happens on only the 3k message, not the 20MB folder file. (read on...) >handle IMAP metadata. Also, you must have a filesystem (such as on >Linux) which has fast open() calls or you will kill performance. This is only true if all files are opened. A good Maildir server implementation will always use caching (for instance, Courier IMAP), so that only new messages or messages that are fetched for content need to be opened. And considering the benefits of having potentially many hundred thousand Maildirs spread over a distributed system (typically $HOME/Maildir or domains/account/Maildir, rather than having the same hundred thousand spool files (typically stored in the same directory) in one location, I can't say that UNIX spools unconditionally wins this performance race. I agree, though, that opening all files, looking for content, is slower potentially than already having opened a file and just running through it. (read on...) >However, it is a myth that the steps taken by maildir are the only ways >to accomplish safe mailbox handling. A better description is that they >are the ways to accomplish safe mailbox handling over NFS in a one >file/message type format and only file operations to accomplish locking. >Maildir does use locking; it locks by means of file operations and the >roles of those three directories. Yes, but there is a very distinct difference here: To lock a UNIX spool file, you need to lock the *whole file*, with all messages contained in it. This seriously limits multiaccessed mailbox performance. In Maildir, each and every message can be individually locked if this is necessary for the server. And how about concurrent delivery? 100 messages can be concurrently delivered to a Maildir. How can 100 messages be concurrently delivered to a UNIX spool? You'd certainly have to have the same process locking the file, and then appending all the 100 messages sequentially to the mail file sequentially. Concurrent delivery clients just won't be concurrent! I'm not sure about how much faster it is to seek for files in Maildir/, but binary searching over readdir() material compared to seeking in the contents of a large file, (I'm not sure about this, other's can perhaps help). >Most problems with safe handling of the traditional UNIX mailbox format are >due to people not understanding locking, and choosing to disable code that >causes warning messages rather than fixing the underlying circumstances which >cause it. I would rather say that most problems with safe handling of the traditional UNIX format are that the format is not crash proof (mess up one file, you've potentially messed up the whole folder), and that similarily, manual editing (happens all the time) of a Maildir is very easy and logical (edit one message, delete it, add new message), while manual editing of the UNIX spool file can be disastrous for the whole folder if you don't know exactly what you're doing. Maildir allows you to do pretty crazy things with one file, without this having any impact at all with the others. >A traditional UNIX mailbox guarded by properly implemented dot-lock >locking is quite safe, even on NFS. It isn't fast (that wasn't what we >were discussing), but it is safe. By what means is it safe? What happens if the server crashes, at any stage of a message delivery with the UNIX spool format? Crash, heck, what if some loony admin kills off all processes with SIGKILL? Compared to Maildir. In Maildir, you'll have stray junk in tmp/, out of the way and nobody sees it. With UNIX spools, you need to do recovery. Perhaps regenerate the whole file? And what happens if for some reason two mails are delivered at the same time? In UNIX spools, you have to lock the file, but with Maildirs, it's no difference at all. No locking is needed. Maildir is fast, concurrent, consistent/crash proof, and can be easily edited by UNIX tools, and clients for the format are very very easily written. Can we say the same about the UNIX spool system? :-) Andy -- Andreas Aardal Hanssen
