> What you gain from my proposal over a pure Message-ID approach > is guaranteed uniqueness given the list copy
Guarantee is a pretty strong word. A malicious person could post two messages with the same message-id, same date, but different bodies. Sometimes the channel between the MLM and the archive server will be SMTP, and spurious messages can be injected. Finally, from the archive server's perspective, some of the MLMs might make mistakes - just like from the MLM's perspective, some of MTAs might make mistakes in setting message-id. So I don't think the proposed SHA1(date, message-id) scheme buys a hard guarantee of uniqueness. Every component has to protect themselves, but none can solve the world's problems. So that moves us to how many collisions are reduced in practice. I have a question about the numbers Barry mined from the python lists. Are the collisions really that high? One should not count messages without a message-id, because the MLM can and should create one in that case. One should also not count collisions of messages going to different lists. Here's why. Let's say message M is cross posted to lists L1 and L2. Even though it is the same message, there are now two different contexts. (For example, people visit M at archive L1 should get a completely different experience if they hit "next message" and people visiting M at archive L2.) So I'd be curious what the collision numbers come to with these two factors taken into account. The other takeaway is list name really should be part of the URL to get proper context. The earlier example from Mharc does this. > and human friendlier urls. That's a very compelling point. SHA1 can't be computed inside someone's head or simple cut-n-pasted together for old messages, but I think the usability benefits of short URLs (short enough that they can comfortably fit inside message bodies) outweighs this drawback. By the way, is SHA-1 still in favor? My impression was it was fading away after the Shandong University team partially cracked it. > Throw it away or hide [Date]? The former would be a problem, > but not the latter. Thrown away. My favorite archival service is based on mhonarc, and raw mail goes into offline cold storage. Of course this can be changed for the future messages with some pain, but there's no reasonable way for myself (or any other mhonarc users in the same predicament) to retrofit against Date based URLs. For the record, here's what mhonarc embeds in each HTML page it produces because these were considered the important headers. In this message sent from Australia, the date shows a timezone of UTC -0700, because it was pulled from the received header. <!-- MHonArc v2.6.15 --> <!--X-Subject: [Gossip] Re: green-travel resources {webliographies} --> <!--X-From-R13: "[nephf Z. Saqvpbgg" <zraqvpbgNlnubb.pbz> --> <!--X-Date: Wed, 26 Apr 2006 00:27:27 -0700 --> <!--X-Message-Id: [EMAIL PROTECTED] --> <!--X-Content-Type: text/plain --> <!--X-Reference: [EMAIL PROTECTED] --> <!--X-Head-End--> So my main request is to double check the numbers, see if using "Date" really buys as much as one thinks. I'll keep digesting the other aspects of the wiki page. _______________________________________________ Mailman-Developers mailing list Mailman-Developers@python.org http://mail.python.org/mailman/listinfo/mailman-developers Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-developers%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-developers/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp