> run md5sum on the mail message body and store the resulting string in > a file then compare each message against this list in the file, if the > md5sums of the message body are the same then the message is > guaranteed to be the same. Nope. If the md5sum hashes are different, the messages are guaranteed to be different. If the hashes are the same, there is always a slight probability, that the messages are *NOT* the same.
true one in a multi billion chance.. so i will take my chances, after all the chance of getting a bit error on my hard disk are bigger then that, your solution with formail is much easier on the cpu, but the probability of email systems generating the same message-id header are much much larger then a md5sum clash...
With a limited length of hash value, you cannot guaranteed distinct longer data chunks. > > In some folder, for some reasons I have duplicate mails (same mail, two > > or three times). Vincent, I have posted a small hack (shell script using formail) to delete duplicate messages based on the Message-Id: header. Search the archive for it and read my notes carefully. As I got some feedback and it currently is not wise to run it more than once [1] I already planned to rewrite it and post it again. Silly me even sort of announced it without the time to code. This seems like a good possibility to actually rewrite it and release it... ...guenther
<<attachment: smiley-4.png>>
