Hi All-- Mark Sapiro wrote: > Ivan Van Laningham wrote: >> I ran cleanarch, yes, but all it did was to escape every single "From " >> line, which would make arch think there was only one message. > > > > Then either the From line doesn't match the pattern > mailbox.UnixMailbox._fromlinepattern or it is not followed immediately > (with no intervening lines or maybe even '\r') by a line that looks > like a message header. > > If there is intervening whitespace between the "From " line and the > message headers, that may cause the spurious archived empty messages. >
Ah. Now we're getting somewhere. Here are some sample "From " lines: 1) From the current list.mbox (leading '> ' not part of actual line): > From [EMAIL PROTECTED] Sun Mar 18 18:17:56 2007 2) From the old mbox which I want to incorporate (leading '> ' inserted): > From "robyn m. fritz" <[EMAIL PROTECTED]> or > From [EMAIL PROTECTED] (C Ryplansky) And here is the _fromlinepattern: _fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \ r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$" Now, I don't understand much of this pattern, but it looks to me as if a) there's no provision for matching " or < or > characters; and b) some sort of date/time mark is required. All the "From " lines are terminated with a \n, and all are followed immediately by what look like valid message header lines, so I don't think those are problems. There do appear to be 1006 unescaped "From " lines in the old mbox: $ grep '^From ' guppies-out.mbox | wc 46295 163728 1800087 $ grep '^From: ' guppies-out.mbox | wc 45289 159710 1803623 So, if I process the old mbox and convert the "From " lines without dates into "From " lines without " and <> and add a date/time stamp, and THEN run cleanarch, cleanarch should escape only the 1006 non-matching "From " lines, and I should end up with an mbox I can combine with March, April and May of 2007 from the current list. Is that a correct assessment? Metta, Ivan -- Ivan Van Laningham God N Locomotive Works http://www.pauahtun.org/ http://www.python.org/workshops/1998-11/proceedings/papers/laningham/laningham.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp