(remember the thread, we're comparing to UNIX spools)

>On Tue, 14 May 2002, Arnt Gulbrandsen wrote:
>Andreas Aardal Hanssen <[EMAIL PROTECTED]>
>> I know this. What's your point? That I'm lousy at throwing together
>> examples, or that UNIX commands can't do the job? ;)
>That if you want reliability and correctness, it's not a lot simpler than
>using the classic berkeley format.
>It may be a little simpler. I don't know.

Are UNIX spools crash proof? How do you define "reliable"?

What happens to an mbox folder if it is incorrectly edited with "jed"? How
do you define "correct"?

The fact is that file names in a Maildir folder are unique, and the time
stamp ensures that under normal conditions, no one file will take the name
that another file had earlier. That means that concurrent local
deliveries, flag manipulation, deleting of messages or even editing of
messages can be done concurrently, and in most cases with no locking
required.

This makes the use of UNIX tools very easy and practical. If I want to
delete a message, I delete it. The only chance that this operation will
mess up things, is:

1) Someone first tampered with the system time on the server
2) The message I wanted to delete was moved or deleted by another
   client
3) A new message was delivered at the exact same second as the former
   one, and the delivering process had the same pid as the first one.
4) I deleted the message.

And /if/ these conditions happen, then the folder will always be in a
consistent state. Can you show an example of how this is not true, or
perhaps better in a UNIX spool environemt?

>> >> To delete all messages from Ole:
>> >>   find . -type f | xargs grep -liE 'return-path:.*?<ole>' | xargs rm -v
>> >That command deletes 1) your message to the list 2) this reply and 3) some
>> >other messages, but not messages from ole@localhost or from [EMAIL PROTECTED]
>> I know this too. Is it very hard, though, to solve the problem?
>About as hard as solving it for berkely mbox, I'd guesstimate. In both
>cases you need a proper rfc822/2047 parser, which rather dominates the
>complexity.

Solving this problem is just a matter of changing the regular expression.
Adding "/ms" makes the regex a true multiline, and seperating the header
section with "[\r\n][\r\n]" isn't exactly rocket science.

No, you don't need an rfc822 parser to find the from-header of a message.
This is false for both mbox and Maildir. However, do remove a message from
an mbox, you need to manipulate the mbox content, one method being "copy
some, skip some, copy rest". During this operation, you /must/ lock the
file to prevent concurrency. This is for most operations unneccessary with
Maildirs (at least for deletes).

><about races>
>> No guarantee.. but what's the worst case scenario?
>1. Returning "success" when in fact the command did not succeed. If the
>   file name's used to store flags, and those flags are set, this can
>   happen.

So what? Two clients that fight over flag settings in IMAP always override
eachother. No big deal.

>2. Deleting the wrong mail. If someone deletes one of the same message as
>   you want to delete, and some tempnam()-like code then chances on the
>   same name, this can happen.

Do you know how filenames are created in Maildir? If you knew this, then
you would also know that the case you suggest is practically impossible.

Filenames contain a timestamp, are created with O_EXCL and with the pid of
the process. For these names to collide, you must either tamper with the
system time and deliver a message with the same PID as the former message,
or two manual operations take place on the same depository.

How do you suggest this poses a problem in real life?

>> Ok, back to you: Try to think of the impact of the race conditions, and
>> then _conclude_ with something.
>Your simplicity is very simple, but it's not reliable.
>You use grep as an example. Well, using grep for searching doesn't handle
>quoted-printable or base64 encoding, doesn't handle character sets
>correctly (think utf-8 vs. iso-8859-1), doesn't handle rfc822 header
>wrapping or do header/body differentiation, doesn't handle rfc2047 header
>encoding or 822-style quoting. There may be more - those are just the
>problems I remember from using grep on MH mailboxes.

If you want to search in base64 encoded data, do

cat <message> | base64 -d - | grep ...

If you have problems with codeset, use

cat <message> | iconv -f iso-8869-1 -t US_ASCII | grep ...

Dan J. Bernstein has written a toolbox that lets you search
for arbitrary rfc822 content: http://cr.yp.to/mess822.html

So for all cases you suggest, there's a solution with command line UNIX
tools.

>Now, if you don't care about reliability, that's fine for you. But this is
>the IMAP mailing list. An IMAP server may not "mostly" delete the right
>messages. It may not return "most" of the search results. If you want to
>argue that maildir is a superior mailbox format, "grep mostly works" won't
>do.
>--Arnt

As stated in my former mail, please sketch some problems that may arise
with Maildirs, causing the methods to be unreliable.

Then, compare to a UNIX mail spool using the same UNIX tools, and
conclude. "The tool 'iconv' certainly works better with mbox here" and so
on. Because as you can see, for all the cases you claim, I have solutions.

I'm certainly not saying that there are solutions for /all/ problems, but
for most problems where you want to use UNIX tools, Maildir is safer, more
reliable and easier than mbox.

And please read the thread: the claim was that Maildir is less reliable
than UNIX spools, and that UNIX tools are easier to use with mbox than
Maildir. Nobody said that UNIX tools with either formats is reliable or
efficient or anything.

Andy

-- 
Andreas Aardal Hanssen


Reply via email to