On 11/09/2014 06:37 PM, Hal wrote: > > Investigating the MBOX files in a text editor I found the problematic > ones to have headers starting with ">From " (without the quotes) which > the working ones didn't, so I removed all those lines from a couple of > MBOX files, imported into the Mailman archives and all looked fine! > Obviously I can't check every single posting, so does my discovery and > solution sound feasible?
Mailman's bin/arch is very liberal (actually too liberal, thus cleanarch) in what it accepts as a "From " separator in a mbox. It assumes that any line beginning with "From " is the start of a new message. Messages should look like >From u...@example.com Sat Aug 16 15:10:02 2014 Some-Header: ... Next-Header: ... ... Last-Header: ... first body line next body line ...e last body line i.e. the first line of each message is of the form >From user date This is followed by headers which have a name ending with a colon and which may be folded into multiple lines as long as the continuation lines begin with a space. The headers are terminated by an empty line (on the wire, the sequence <CR><LF><CR><LF>) and the rest up to the next "From " separator is the body. Sometimes, in some mbox formats, message bodies have lines that begin with "From ". This confuses bin/arch into thinking a new message starts there. bin/cleanarch looks at lines that begin with "From " and if they don't look like >From user date or aren't followed by a header-like line, it prefixes the line with > so it won't confuse bin/arch. If however your mbox has true "From " separators that don't look like >From user date (perhaps because the date format is wrong or some other reason), cleanarch will 'escape' them which would be wrong. So cleanarch may have munged your mboxes or they may be weird for other reasons. In any case, I think you need to look at the original mboxes, maybe with something like "grep '^From '" (or maybe "egrep '^>?From '") to verify that there is some kind of unescaped "From " line at the beginning of each message and that there are no unescaped "From " lines in message bodies and possibly fix the problems manually. Note that removing all lines starting with ">From " may be problematic. It could remove a body line if that body line originally started with from. On the other hand, if such a line was in the headers, it would cause premature termination of the headers which could be your issue, but I would wonder how such a line got there. -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org