Quoting Alan McConnell ([EMAIL PROTECTED]): > Meanwhile, I am adminning(sp?), through my ISP, a new but quite active > E-list. But their mailman install is incomplete; they haven't put in > Pipermail(about which I know _nothing_). I'm saving all the messages -- > mbox format -- and have the hope that when the Pipermail archiving > program is installed, I will be able to collect, collate, shuffle, > and massage these messages and then ship them off to the new very > skilful tech staff that my ISP is allegedly hiring, and they will > be able to slip this collection adroitly into place. And it will > be as if archiving was always in place . . . > > Can this be done? or am I dreaming wild dreams?
I've just spent two days manipulating a bunch of mbox files into archives. Let me tell you how it goes: 1. Blow away the html archives. You may prefer to use that arch command we were just discussing, but I used "rm -rf /var/lib/mailman/archives/private/<listname>" 2. Stop mailman's qrunners using "/etc/init.d/mailman stop" 3. Run bin/arch on the huge mbox file. 4. Discover that bin/arch is consuming all the memory and swap on the system, and your system has ground to a halt. 5. Kill bin/arch. Wait for the system to recover the swap space. At this point, I should have rebooted because I think this is when my list config.pck file got corrupted. Restore the config.pck file from backup. 6. Discover an awk script in the mailman archives that will split the mbox archive into managable chunks. Fix it so that it splits them into 500 message chunks instead of the 80 message chunks it defaults to. 7. Run bin/arch on all the chunks one at a time. 8. Discover that the mbox file had a bunch of un-escaped "From " lines that confused bin/arch and so you have a bunch of half-articles in today's archive page that shouldn't be there. Run bin/cleanarch to fix them, blow away the html archives, and then resplit the mbox file and run bin/arch on the splits. 9. Discover that in early 2000 some members of your mailing list were using a MUA that set year to "100" in the "Date: " header, which confused bin/arch. Fix those up with sed, then blow away the html archives, then resplit the mbox and run bin/arch on the splits. 10. Discover a couple of "From " lines that bin/cleanarch didn't fix because somebody was quoting the mail headers of another message. Fix them with sed, then blow away the html archives, then resplit the mbox and run bin/arch on the splits. 11. Discover you missed a "From " line in one message, say "to hell with it", restart mailman, and go to bed. -- Paul Tomblin <[EMAIL PROTECTED]> http://blog.xcski.com/ "The means of defense against foreign danger historically have become the instruments of tyranny at home." - James Madison ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp