On September 3, 1998 at 13:10, Claire McNab wrote:
> > For this reason I prefer MD5 sums of the message body -- there is
> > statistically only a 1:18446744073709551616 chance of matching a
> > false positive. For anyone interested, I've written some sendmail
> > 8.9.1 patches to add md5sums at the sendmail level (based on the work of
> > Martin Hamilton) and also have a procmail recipe to do the same.
>
> This sounds like a useful issue to tackle. I've been hit by it a few
> times, when from other parts of my site, or in in other messages to
> the list, I have referred to articles by filename ... only to find
> that later, when I have rebuilt the archives to roll out new .rc
> files, the filename has changed :(
>
> However, I wonder about the MD5 method. Without knowing anything
> about MD5, could it work with 8.3 filenames? I buid my archives on a
> DOS box, so am constrained to that format.
Such a method will not work under 8.3 filenames. The current method
is friendly to 8.3 systems.
> I am also not concrned about the lack of message-IDs: this problem
> becomes visible quite quickly in my setup, as articles are repeatedly
> added to the database on each archive build. When I spot this, I
> just edit the mbox file and add a message-id of the form
> poster's_name_YYMMDDHHMMSS_something_random@no-valid-msg-id
> (I know this is a prob for others, and I recognise the difficulty --
> I'm just saying its not a prob for me, though I hope it would be
> supported for the benefit of others, esp those with more heavily
> automated systems).
v2.3 will create a message-id for messages w/o one. The id has
the string "NO-ID-FOUND" in it so one can tell the id was generated
be MHonArc.
> So it occurred to me that one way of implementing this would be to
> create a new .db file (e.g. filename.db), which would record the
> filenames used for each message ID and for each MD5 sum. That way
> the chances of a duplicate occurring are *very* low: it would require
> a duplicate MD5 sum *and* a duplicate or missding msg-id.
>
> AFAICS, mhonarc.db is wiped when the archives are rebuilt ... and all
> that would be needed is to ensure that filename.db is not wiped on a
> rebuild, and its data reused. That way, we could retain the current
> flexibility of filename format (which has other advantages, such as
> being reasonably transparent) and add permanency.
>
> How does that sound?
Changing the v2.x code base to support different filenames from the
current convention will take some work. Also, if such a feature
were to be added to v2.x, the current filename style should still be
supported. I.e. Alternate schemes would be triggered by a resource.
Using messsage-ids (or MD5 sums) is something I will look into
for v2.x, but after v2.3 is released.
--ewh
----
Earl Hood | University of California: Irvine
[EMAIL PROTECTED] | Electronic Loiterer
http://www.oac.uci.edu/indiv/ehood/ | Dabbler of SGML/WWW/Perl/MIME