On 12/22/2010 5:26 AM, Lukáš Vlček wrote: > > What is the Mailman algorithm to number individual HTML representations > of mails?
Sequential in order of arrival. > My understanding was that once the new mail is received by Mailman then > it is processed, appended to mbox accumulated file and put into > private/public archive folder (i.e. HTML representation is rendered and > stored on the disk). If the flow is that smooth then the numbering would > really match the order of individual messages in accumulated mbox file. This is correct. Further, the list is locked during this process so even with "simultaneous" arrival of two messages to be archived, the order in the .mbox should match the sequence in the pipermail archive. > May be if the new message has to undergo admin moderation then this can > influence the result numbering (resulting in numbering gaps?), but I am > just speculating here... No. It is not archived until after moderator approval. > Do you think you could shed more light on the numbering process? > To me it seems unfortunate that there is really no simple way how to > determine valid URL for individual mails in mbox file. The number in the archive *should* match the sequence in the .mbox. The reasons why it doesn't include manual editing of the .mbox file, running bin/arch to add messages to the archive without adding them in the same sequence to the .mbox file, and messages with embedded, unescaped "^From " lines in the body. > If you don't want to rebuild the pipermail archive and possibly renumber > messages, you will need to develop some script to go through the .mbox > and parse the archive period (year/month or whatever the period is in > your case) from the messages and search the nnnn.html files in that > directory for a match. > > > Search for the match using Message-ID value? > Message-ID is not always present in HTML version, is it? All I can see > is that the Message-ID value is encoded into mailto: link as a > In-Reply-To value. Other than that some advanced heuristics would have > been used... In Mailman 2.1.10 and later, the mailto: always contains the message-id of this message in the In-Reply-To fragment. Prior to 2.1.10 there was not always a message-id in the mailto: and if there was, it was not the message-id of this message but rather the in-reply-to of this message. I suggest you simply test your .mbox file to see if the sequence numbers you generate from the From_ lines match those in the archive. As long as you have not manually manipulated the .mbox or merged separate .mbox files, there's a good chance this will be OK. You don't have to check every single message. If the numbering is off, there will be places where the numbering jumps from being correct to "off by one" and then to "off by two", etc. I.e., I don't think you have to worry about things like an mbox sequence of n, n+1, n+2, n+3, ... corresponding to an archive sequence of n, n+2, n+1, n+3, ... See the FAQ at <http://wiki.list.org/x/RIA9> for a description of what happened to this list when the archive was rebuilt in 2006. -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org