On Mon, Feb 20, 2006 at 09:21:04AM +0100, Christian Ferrari wrote:
> I'm writing a shell script: it identifies "duplicated" mails, strip 
> "offending header lines" like "Delivered-To:" and hard links files 
> paying attention to different filesystems/devices. To avoid complete 
> scanning, a persistent "memory" is saved in a status file.
> After some refinements I'll release it.
> Are there some guys want to try it on a (test!) battlefield?

Not me. If you're going to have an MTA independently fork and deliver copies
of the same message to A, B, C and D, then there are massive race conditions
involved:
- you require the MTA to complete delivery to A before starting delivery to
B
- after delivering to A, you require your delivery script to finish updating
its cache database before the MTA starts delivery to B (otherwise you'll
miss the fact that these messages are the same)
- if the MTA can make two or more simultaneous deliveries, which every MTA
I know does, you'll need to lock your cache database to prevent simultaneous
updates [or have an append-only cache which needs to be purged periodically]
- if the MTAs are distributed across multiple front-ends, the cache database
will need to be stored on some central server [can't use flock() over NFS]

I think you need more than a simple shell script to address these issues.

It would be far more reliable IMO to have the MTA deliver one copy of the
message itself, when it receives a message with multiple recipients in the
envelope. OTOH, your proposed mechanism has the advantage that it could
identify multiple copies of the same message which are delivered in
different SMTP transactions.

There might be other ways to implement this. For example:
- take an MD5 of the message body (having stripped out Delivered-To: and
  Received: and anything else which might be unique to one copy)
- deliver the message to /somepath/x/y/md5hash (where x and y are parts of
  the md5 hash). If this file already exists, leave it alone.
- hard link from the maildir to this location

Effectively that's using the filesystem itself as your cache. With care,
i.e. the right choice of atomic operations, it can be made lock-free. You'll
need a separate /somepath base for each filesystem that contains Maildirs,
of course.

Regards,

Brian.


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Courier-imap mailing list
[email protected]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-imap

Reply via email to