Re: msgid instead of seq. number for output files

Earl Hood Thu, 30 Jul 1998 22:54:00 -0400
On July 30, 1998 at 08:09, Christopher Lindsey wrote:

> > Ah, before I get asked: If no message-id is given mhonarc should/will
> > create it's own id on the fly as it does now.
> 
> If it does this (and it should do it now for the duplicate message checking),

MHonArc adding IDs will not help duplicate message checking.  It helps in
other ways.

> checks should be made for RFC-compliant Message-Id: headers.  A lot of 
> messages that I get from misconfigured relays don't send unique Message-Ids,
> therefore breaking the duplicate message checking.

Yes, that is a problem.

> Of course, not everyone would like to use md5sums for this.  It could
> really slow things down if you were adding 10000 messages to the archive
> and needed to calculate a sum for each one.  So what about the possibility
> of choosing which header you want to use for duplicate checking?  Is
> that easy to create a resource for?  Is it extensible to Achim's 
> suggestion?

Something like FROMFIELDS can be done.  However, I will need more
information on the requirements that are needed to make it
effective.  For example, is a simple string compare sufficient,
or is something more elaborate needed.  Can I take the two md5sums
and just do a string compare to determine uniqueness?  Or are
additional computations needed depending on the fields that are
being evaluated.

BTW, I prefere not have MHonArc do anything like computing md5sums.
You sited the main reason: performance.  If md5sums are needed, the
user should do it via the MTA or some other program where it can be
done more efficiently.  It also promotes a division of labor.

        --ewh

----
             Earl Hood              | University of California: Irvine
      [EMAIL PROTECTED]      |      Electronic Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME
Re: msgid instead of seq. number for output files

Reply via email to