On Mon, Feb 20, 2006 at 09:21:04AM +0100, Christian Ferrari wrote: > I'm writing a shell script: it identifies "duplicated" mails, strip > "offending header lines" like "Delivered-To:" and hard links files > paying attention to different filesystems/devices. To avoid complete > scanning, a persistent "memory" is saved in a status file. > After some refinements I'll release it. > Are there some guys want to try it on a (test!) battlefield?
Not me. If you're going to have an MTA independently fork and deliver copies of the same message to A, B, C and D, then there are massive race conditions involved: - you require the MTA to complete delivery to A before starting delivery to B - after delivering to A, you require your delivery script to finish updating its cache database before the MTA starts delivery to B (otherwise you'll miss the fact that these messages are the same) - if the MTA can make two or more simultaneous deliveries, which every MTA I know does, you'll need to lock your cache database to prevent simultaneous updates [or have an append-only cache which needs to be purged periodically] - if the MTAs are distributed across multiple front-ends, the cache database will need to be stored on some central server [can't use flock() over NFS] I think you need more than a simple shell script to address these issues. It would be far more reliable IMO to have the MTA deliver one copy of the message itself, when it receives a message with multiple recipients in the envelope. OTOH, your proposed mechanism has the advantage that it could identify multiple copies of the same message which are delivered in different SMTP transactions. There might be other ways to implement this. For example: - take an MD5 of the message body (having stripped out Delivered-To: and Received: and anything else which might be unique to one copy) - deliver the message to /somepath/x/y/md5hash (where x and y are parts of the md5 hash). If this file already exists, leave it alone. - hard link from the maildir to this location Effectively that's using the filesystem itself as your cache. With care, i.e. the right choice of atomic operations, it can be made lock-free. You'll need a separate /somepath base for each filesystem that contains Maildirs, of course. Regards, Brian. ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Courier-imap mailing list [email protected] Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-imap
