Hi All-- Mark writes: > > > > >[Sangha] Anger and its expression Ryunyokingryunyo at earthlink.net > > > >The address is supposed to be "ryunyo at earthlink.net". > > > No, it is supposed to be "[EMAIL PROTECTED]". >
Not exactly. On the index pages for the archives, index lines take one of two forms: # [Sangha] Welcome Home mag at swcp.com or # [Sangha] Welcome Home Pat Stacy Nowhere are @ signs supposed to be used in the archives. Or at least in the version of Mailman I'm using and the way I've got it set up. I looked at .txt files produced by the new list and compared them with the old .txt files I've got. >From lines from new .txt file: >>> >From erstad at nilsandreas.info Thu Feb 1 01:01:49 2007 From: erstad at nilsandreas.info (Nils Andreas Erstad) <<< >From lines from an old .txt file: >>> >From [EMAIL PROTECTED] Mon, 31 Jul 2000 23:34:46 -0600 Date: Mon, 31 Jul 2000 23:34:46 -0600 From: Ivan Van Laningham [EMAIL PROTECTED] <<< >From lines from existing mbox file: >>> >From [EMAIL PROTECTED] Wed Feb 28 01:40:24 2007 Date: Tue, 27 Feb 2007 18:38:34 -0700 From: Ivan Van Laningham <[EMAIL PROTECTED]> <<< <<< The difference appears to be in the address on the From: line: that is, "Ivan Van Laningham [EMAIL PROTECTED]" fails but "Ivan Van Laningham <[EMAIL PROTECTED]>" If that's correct, I can modify those lines easily. Metta, Ivan On 4/20/07, Mark Sapiro <[EMAIL PROTECTED]> wrote: > Ivan Van Laningham wrote: > > >Hi All-- > >This is very helpful. What I have are basically three sets of archives. > > > >1) Archives from the current list, fairly small and created about two > >months ago after a disastrous ISP debacle (the yo-yos got themselves > >_evicted_, for heaven's sake); > > > >2) Archives from the previous host and list incarnation and a much > >earlier version--but still > 2.0--of Mailman; > > > >3) Archives from the previous host, same list, but a version of > >Mailman that might have started with the digit one. ;-) The person > >who upgraded Mailman in Feb 2002 didn't bother to import the existing > >archives, so now is the first time I've tried to import such old > >archives. > > > >I have successfully dealt with 1 and 2. Appending the two mboxes > >works well, probably because there is a two-week gap between the two > >latest incarnations of the list. > > > >However, 3 is a problem. I don't have an mbox for the earliest > >archives; instead, I have the text files--2002-February.txt, > >etc.--which appear to me to be in mbox format. > > > The .txt files are similar to .mbox files, but there are various > differences. Many headers have been removed and, most importantly, > email addresses may have been obscured by changing [EMAIL PROTECTED] to > user at example.com. > > > >If I run cleanarch on these text files before running arch on them, > >they do not appear in the archives. > > > Probably because cleanarch escapes all the "From " separators because > the email address has " at " instead of "@". > > > >If I skip cleanarch, then I get > >bad addresses in the posts in the archives (and yes, I did use the > >--wipe option). The bad addresses look like the following in the > >index page: > > > >[Sangha] Anger and its expression Ryunyokingryunyo at earthlink.net > > > >The address is supposed to be "ryunyo at earthlink.net". > > > No, it is supposed to be "[EMAIL PROTECTED]". > > > >How can I preprocess the text files to fix the problem addresses? I > >assume it's because the old text files have something like From: > >Ryunyo King<"ryunyo at earthlink.com"> in the from line. Is there a > >secret option to cleanarch I didn't see? > > > cleanarch won't do this. You need to process the .txt files your self > with your own script or by hand to replace " at " with "@" in email > addresses before using cleanarch. > > Obviously, you can't just globally replace " at " with "@" as there > will be many occurrences of " at " outside email addresses. > > You might limit your self to "From " lines and From: headers. That will > probably work. You could also try to use some regexp that only matches > " at " if it looks like it's in an email address. > > > >(I also ended up with a slew of duplicates when the upgrade happened > >in Feb 2002; half the messages are right, the other half of the > >duplicate messages have addresses similar to the above. But I'm > >pretty sure I can deal with those.) > > > >Thanks for all the help. > > > >Metta, > >Ivan > > -- > Mark Sapiro <[EMAIL PROTECTED]> The highway is for gamblers, > San Francisco Bay Area, California better use your sense - B. Dylan > > ------------------------------------------------------ > Mailman-Users mailing list > Mailman-Users@python.org > http://mail.python.org/mailman/listinfo/mailman-users > Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py > Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ > Unsubscribe: > http://mail.python.org/mailman/options/mailman-users/ivanlan9%40gmail.com > > Security Policy: > http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp > -- Ivan Van Laningham God N Locomotive Works http://www.pauahtun.org/ http://www.python.org/workshops/1998-11/proceedings/papers/laningham/laningham.html Army Signal Corps: Cu Chi, Class of '70 Author: Teach Yourself Python in 24 Hours ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp