On Sun, 20 Oct 2019, Geert Uytterhoeven wrote: > Hi all, > > I'm working to add this list to lore.kernel.org.
That's great news because lore.kernel.org is a search engine that actually works. > As one of prerequisites they require that we provide full existing > archives of all list messages (or, at least, as complete as possible). > I've collected mine already, but would really appreciate if you could > pitch in from your own collection. > > Just follow the instructions on this page: > https://korg.wiki.kernel.org/userdoc/lore > For anyone else attempting this, note that linux-m68k has two addresses, so you need to pass two '-l' parameters: -l linux-m68k.vger.kernel.org linux-m68k.lists.linux-m68k.org The above wiki page neglects to mention that the 'list-archive-maker.py' script has serious problems. It can't deal with Alpine mboxes because they don't mangle "From" in message bodies as ">From". This leads to truncated messages. I strongly recommend that you enable the '-r' parameter and then examine all of the rejected messages. You'll also need to edit the script to avoid capturing rejected messages that they were rejected for obvious reasons (wrong list-id) rather than messed-up message boundary (i.e. a 'From ' mistakenly used as a message delimiter). Another problem with that script is that it captures too much. It will grab messages that appear to be cross-posted (based on To: or Cc:) even if those messages never reached linux-m68k. I suppose the idea is that capturing too much is better than too little? The script fabicates a missing List-ID header based on a guess. I don't know why it does this (bad idea from an archival perspective). > I uploaded the list of message-ids that I already have to > http://users.telenet.be/geertu/linux-m68k-message-ids.tar.xz > You'll need it during the archive sanitization process to pass to the -k > switch. > > Please tar up and xz -9 the resulting directory with mbox files and send > the archive to me so I can add it to what I already have. > > The archives I used, from my personal email collection, are: > 1. [email protected] 680x0 channel digest (May 1993 - March > 1995) > Used initially. Probably there was never a non-digest version? > 2. [email protected] (Dec 1994 - Dec 1995) > First real mailing list. Abandoned due to latency (most developers were > located in Europe and 2 Mbps transatlantic sucked). > 3. [email protected] (Oct 1995 - Oct 2004) > Second mailing list. Abandoned due to spam and lack of admin activity. > I did my best to remove spam. > 4. [email protected] (Oct 2004 - Current) > Current mailing list. > As this is a single logical mailing list, the plan is to combine all of > it in a single archive. > > My archive should be fairly complete, except for network outages, and e.g. > the Gandi email disaster week 2 years ago. And I don't have anything from > the real early days, unfortunately. > I'll let you know if I find any missing messages here > Note that sanitization script choked on some mails from the old > phil.uni-sb.de list, so it didn't succeed for me. > Was that the "From" bug? I am experimenting with pre-processing of mboxes to substitute the "From" lines in the message bodies. Not yet sure if this will be entirely successful... -- > Thanks! > > Gr{oetje,eeting}s, > > Geert > >
