Hash: SHA1

On Jul 20, 2007, at 9:21 AM, Stephen J. Turnbull wrote:

>> How likely is it that two messages with the same message-id and
>> date are /not/ duplicates?
> For message id generators that include a time-stamp in the generated
> id, approximately the same as the probability that two messages with
> the same message-id are not duplicates, no?

Good point, though clearly not all message-ids have timestamp  
information in them.  It does help explain why I see 600-odd more  
collisions when taking other data into account too.  I've modified my  
script to sort collisions and dupes into maildir folders, so I'll  
take a closer look when that finishes running (it takes a long time  
to slog through all 5 mboxes, even on a fairly zippy dual-G5).

>> Heck, at that point, I'd feel justified in simply automatically
>> rejecting the duplicate and chucking it from the archive.
> I'd rather not go there.  There may be applications for the archiver
> that require that all mail received be filed.

True.  It would ultimately be an archiver policy though.

> Counterproposal: have a "collisions" namespace, and provide an
> interface for the list owner to decide what to do with them.  They
> could be thrown away, they could be given an alternative global ID
> somehow and added (eg, the archive page could add a "See probable
> duplicates too" link), or they could be put into a moderation-like
> queue for list admins to decide about.

I like this.

>> So now, think of the interface to a message store that supports this
>> addressing scheme.  Well it's something like:
> I don't understand how the calling application is supposed to deal
> with a DuplicateMessageError exception since it should not change
> either the Message-ID or the Date if present.
> I see this as a major problem with any proposal to use only author
> headers in computing the "global id".

Mailman would probably log and ignore DuplicateMessageErrors.  It  
wouldn't be Mailman's responsibility to ensure the message gets  
archived, although I concede that as currently defined, you could end  
up with list copies that had a global id header that wasn't unique.   
OTOH, if the archiver implements a collision resolution policy such  
as a 'collisions' namespace, it wouldn't ever raise  

>> Or by using the global id, or by rejecting messages with duplicate
>> message ids.
> Er, the MTA has already accepted it.  Do you plan to generate a list
> manager bounce to the poster?  This has the unpleasant misfeature that
> it could be used to bounce spam off the list manager, since the poster
> needs to see content to determine whether this is a multiple send or
> actually the "intended version" after a "fat-finger" send; we already
> know the message-id isn't good enough.

Yes, this wouldn't be an MTA bounce, it would be a Mailman bounce.   
But it would have to be subject to the same bounce rules as any other  
auto-response which could be used as a spam vector, e.g. limit the  
number of bounces per time period and don't include the entire  
original message in the bounce (as both can be, and are used as spam  

- -Barry

Version: GnuPG v1.4.7 (Darwin)

Mailman-Developers mailing list
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
Searchable Archives: 

Security Policy: 

Reply via email to