Hi,

I've recently imported some sizable archives (~100k messages) of old
mailing lists and noticed some slight inconveniences:

1) RFC5322/822 invalid Date: headers should be parsed more gracefully

Some old mails had Date: headers without time zones, e.g.
Date: Sat, 27 Sep 1997 10:02:32

This results in public-inbox asserting this is the current date.
But this assumption makes no sense (literally every other guess
would be more likely), and also results in these messages showing up
on the first page of the archive.  Furthermore, sorting is then not
stable, pressing F5 make the threads jump around.  I'd recommend
falling back to +0000 instead.

2) Weird From: lines crash the whole import

From: "=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de

This funny line broke import_maildir:

fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet 
<"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de> 1101853296 +0100
fast-import: dumping crash report to 
/var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402
EOF from fast-import:  at /usr/share/perl5/vendor_perl/PublicInbox/Import.pm 
line 96, <$r> line 54681.

I fixed it manually.  (But I think it's actually a valid mail address,
even in this botched state.)  I'm not sure what added the ">", it's
not in the original mail.

(I use public-inbox-1.3.0/git-2.25.0 on Void Linux.)

thx,
-- 
Leah Neukirchen  <[email protected]>  https://leahneukirchen.org/
--
unsubscribe: one-click, see List-Unsubscribe header
archive: https://public-inbox.org/meta/

Reply via email to