Thanks a lot Mark. Appreciate this!
Regards, Lukas Dne 22.12.2010 18:35 "Mark Sapiro" <m...@msapiro.net> napsal(a): > On 12/22/2010 5:26 AM, Lukáš Vlček wrote: >> >> What is the Mailman algorithm to number individual HTML representations >> of mails? > > > Sequential in order of arrival. > > >> My understanding was that once the new mail is received by Mailman then >> it is processed, appended to mbox accumulated file and put into >> private/public archive folder (i.e. HTML representation is rendered and >> stored on the disk). If the flow is that smooth then the numbering would >> really match the order of individual messages in accumulated mbox file. > > > This is correct. Further, the list is locked during this process so even > with "simultaneous" arrival of two messages to be archived, the order in > the .mbox should match the sequence in the pipermail archive. > > >> May be if the new message has to undergo admin moderation then this can >> influence the result numbering (resulting in numbering gaps?), but I am >> just speculating here... > > > No. It is not archived until after moderator approval. > > >> Do you think you could shed more light on the numbering process? >> To me it seems unfortunate that there is really no simple way how to >> determine valid URL for individual mails in mbox file. > > > The number in the archive *should* match the sequence in the .mbox. The > reasons why it doesn't include manual editing of the .mbox file, running > bin/arch to add messages to the archive without adding them in the same > sequence to the .mbox file, and messages with embedded, unescaped "^From > " lines in the body. > > > >> If you don't want to rebuild the pipermail archive and possibly renumber >> messages, you will need to develop some script to go through the .mbox >> and parse the archive period (year/month or whatever the period is in >> your case) from the messages and search the nnnn.html files in that >> directory for a match. >> >> >> Search for the match using Message-ID value? >> Message-ID is not always present in HTML version, is it? All I can see >> is that the Message-ID value is encoded into mailto: link as a >> In-Reply-To value. Other than that some advanced heuristics would have >> been used... > > > In Mailman 2.1.10 and later, the mailto: always contains the message-id > of this message in the In-Reply-To fragment. Prior to 2.1.10 there was > not always a message-id in the mailto: and if there was, it was not the > message-id of this message but rather the in-reply-to of this message. > > > I suggest you simply test your .mbox file to see if the sequence numbers > you generate from the From_ lines match those in the archive. As long as > you have not manually manipulated the .mbox or merged separate .mbox > files, there's a good chance this will be OK. You don't have to check > every single message. If the numbering is off, there will be places > where the numbering jumps from being correct to "off by one" and then to > "off by two", etc. I.e., I don't think you have to worry about things > like an mbox sequence of n, n+1, n+2, n+3, ... corresponding to an > archive sequence of n, n+2, n+1, n+3, ... See the FAQ at > <http://wiki.list.org/x/RIA9> for a description of what happened to this > list when the archive was rebuilt in 2006. > > -- > Mark Sapiro <m...@msapiro.net> The highway is for gamblers, > San Francisco Bay Area, California better use your sense - B. Dylan > ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org