HvR,
> > run md5sum on the mail message body and store the resulting string in
> > a file then compare each message against this list in the file, if the
> > md5sums of the message body are the same then the message is
> > guaranteed to be the same.
>
> Nope.
Calming down and reading RFC 1321...
> If the md5sum hashes are different, the messages are guaranteed to be
> different. If the hashes are the same, there is always a slight
> probability, that the messages are *NOT* the same.
>
> With a limited length of hash value, you cannot guaranteed distinct
> longer data chunks.
The MD5 algorithm indeed is designed for the mentioned purpose -- to
"reliably" identify mails by a short checksum. And it is very wide used
for this purpose.
So you are very right.
The only thing that triggered me, was the guarantee:
As md5sum is limited to 128 bits, there are only 2^128 different
fingerprints and therefore feeding 2^128 + 1 different messages will
produce at least 1 fingerprint to be associated with 2 different mails.
...guenther
--
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
_______________________________________________
evolution maillist - [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/evolution