On Tue, 22 Oct 2019, Geert Uytterhoeven wrote: > > > Note that sanitization script choked on some mails from the old > > > phil.uni-sb.de list, so it didn't succeed for me. > > > > Was that the "From" bug? I am experimenting with pre-processing of > > mboxes to substitute the "From" lines in the message bodies. Not yet > > sure if this will be entirely successful... > > Possibly, my old archives were stored in Alpine mboxes. >
Here are some variations on the From issue: 1) in Message-ID: <[email protected]>, I find that the attachment begins with an "escaped From": >From de4c0f12fd2fd3e8436218dfb5edba3b3d570ee0 Mon Sep 17 00:00:00 2001 I don't think alpine did this. I think it was sent that way. 2) in Message-ID: <[email protected]> which is missing from your archive, there is this line: From . 3) in Message-ID: <[email protected]> which is in your archive, there is this line: From the outside it looks like there are indeed a whish to do so (I added the tab indentation to avoid even more MUA escapades.) Now look what happened to 3) when it reached lore.kernel.org: https://lore.kernel.org/lkml/[email protected]/ Note that the escape now shows up in the html! The original (according to alpine) has no ">From the outside", instead it has "From the outside". That means that if I insert a ">" into message 2) above, to "escape" the "From" and make the importer is happy, then lore.kernel.org will incorrectly render that escape. Similarly, Alpine will also render that modification as ">From" because it doesn't recognize the so-called "escape". And if I don't do that, the importer will truncate the message... --
