Hi, Mark!

Guess this discussion is ancient, but here goes:

On Mon, 13 May 2002, Mark Crispin wrote:
>On Mon, 13 May 2002 10:46:09 +0200 (CEST), Andreas Aardal Hanssen wrote:
>> Are you saying that the standard UNIX mailbox format is safer and more
>> consistent than, say for instance, Bernstein's Maildir format? If so, >
>please explain because I don't immediately see this. The main problem
>with maildir (leaving aside disk and inode usage issues) is that it is
>harder to use UNIX tools with it and that it is not designed to

This is IMHO a little too quick a conclusion. Standard UNIX tools are very
easy to use with Maildir/, especially considering that no parsing is
required to seperate emails, and that all flags can be stored in the file 
name of a message, rather than in the content.

To find messages that contain whatever in the subject: 
  find . -type f | xargs grep -liE 'subject:.*?hei'

To delete all messages from Ole:
  find . -type f | xargs grep -liE 'return-path:.*?<ole>' | xargs rm -v

These operations are much much harder with the UNIX spool system. And 
better yet: they're consistent, with no locking required.

It should also be inarguable that alteration of single messages in the
Maildir/ is more efficient than the UNIX spool system. Flag 
manipulation is just a simple "mv" or rename(), and content change 
happens on only the 3k message, not the 20MB folder file. (read on...)

>handle IMAP metadata.  Also, you must have a filesystem (such as on
>Linux) which has fast open() calls or you will kill performance.

This is only true if all files are opened. A good Maildir server
implementation will always use caching (for instance, Courier IMAP), so
that only new messages or messages that are fetched for content need to be
opened.

And considering the benefits of having potentially many hundred thousand
Maildirs spread over a distributed system (typically $HOME/Maildir or
domains/account/Maildir, rather than having the same hundred thousand
spool files (typically stored in the same directory) in one location, I
can't say that UNIX spools unconditionally wins this performance race.

I agree, though, that opening all files, looking for content, is slower
potentially than already having opened a file and just running through it.
(read on...)

>However, it is a myth that the steps taken by maildir are the only ways
>to accomplish safe mailbox handling.  A better description is that they
>are the ways to accomplish safe mailbox handling over NFS in a one
>file/message type format and only file operations to accomplish locking.  
>Maildir does use locking; it locks by means of file operations and the
>roles of those three directories.

Yes, but there is a very distinct difference here: To lock a UNIX spool
file, you need to lock the *whole file*, with all messages contained in
it. This seriously limits multiaccessed mailbox performance. In Maildir,
each and every message can be individually locked if this is necessary for
the server.

And how about concurrent delivery? 100 messages can be concurrently 
delivered to a Maildir. How can 100 messages be concurrently delivered to 
a UNIX spool? You'd certainly have to have the same process locking the 
file, and then appending all the 100 messages sequentially to the mail 
file sequentially. Concurrent delivery clients just won't be concurrent!

I'm not sure about how much faster it is to seek for files in Maildir/,
but binary searching over readdir() material compared to seeking in the
contents of a large file, (I'm not sure about this, other's can perhaps 
help).

>Most problems with safe handling of the traditional UNIX mailbox format are
>due to people not understanding locking, and choosing to disable code that
>causes warning messages rather than fixing the underlying circumstances which
>cause it.

I would rather say that most problems with safe handling of the
traditional UNIX format are that the format is not crash proof (mess up
one file, you've potentially messed up the whole folder), and that 
similarily, manual editing (happens all the time) of a Maildir is very 
easy and logical (edit one message, delete it, add new message), while 
manual editing of the UNIX spool file can be disastrous for the whole 
folder if you don't know exactly what you're doing.

Maildir allows you to do pretty crazy things with one file, without this 
having any impact at all with the others. 

>A traditional UNIX mailbox guarded by properly implemented dot-lock
>locking is quite safe, even on NFS.  It isn't fast (that wasn't what we
>were discussing), but it is safe.

By what means is it safe? What happens if the server crashes, at any stage
of a message delivery with the UNIX spool format? Crash, heck, what if
some loony admin kills off all processes with SIGKILL? Compared to 
Maildir. In Maildir, you'll have stray junk in tmp/, out of the way and 
nobody sees it. With UNIX spools, you need to do recovery. Perhaps 
regenerate the whole file?

And what happens if for some reason two mails are delivered at the same 
time? In UNIX spools, you have to lock the file, but with Maildirs, it's 
no difference at all. No locking is needed.

Maildir is fast, concurrent, consistent/crash proof, and can be easily 
edited by UNIX tools, and clients for the format are very very easily 
written. Can we say the same about the UNIX spool system?

:-)

Andy

-- 
Andreas Aardal Hanssen


Reply via email to