Eric Wong <[email protected]> wrote: > Leah Neukirchen <[email protected]> wrote: > > 2) Weird From: lines crash the whole import > > > > From: "=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de > > > > This funny line broke import_maildir: > > > > fatal: Missing > in ident string: =?iso-8859-1?Q?Jochen_K=FCpper?= usenet > > <"=?iso-8859-1?Q?Jochen_K=FCpper?= <usenet"@jochen-kuepper.de> 1101853296 > > +0100 > > fast-import: dumping crash report to > > /var/lib/public-inbox/repositories/ding.git/fast_import_crash_31402 > > EOF from fast-import: at > > /usr/share/perl5/vendor_perl/PublicInbox/Import.pm line 96, <$r> line 54681. > > > > I fixed it manually. (But I think it's actually a valid mail address, > > even in this botched state.) I'm not sure what added the ">", it's > > not in the original mail. > > > > (I use public-inbox-1.3.0/git-2.25.0 on Void Linux.) > > Gah, this looks like it's because Email::Address::XS leaves a > "<" in the name... Perhaps Import should delete all [<>] > characters unconditionally? (or swap in appropriate Unicode > homographs and assume users have the necessary glyphs...)
So we already do `$name =~ tr/<>//d', so I think doing the same with `$email' is appropiate for fast-import. The "correct" address featuring '<' will still be indexed in Xapian, at least. -------------8<------------- Subject: [PATCH] import: drop '<' and '>' characters in addresses Some strange "From:" lines will cause Email::Address::XS to leave '<' (and presumably '>') in the address which git-fast-import won't accept even if quoted. Workaround this problem by deleting '<' and '>' the same way we delete them for the ident name. Reported-by: Leah Neukirchen <[email protected]> Link: https://public-inbox.org/meta/[email protected]/ --- lib/PublicInbox/Import.pm | 4 ++++ t/import.t | 2 ++ 2 files changed, 6 insertions(+) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index d8dc49b8..68dc0c7e 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -293,6 +293,10 @@ sub extract_cmt_info ($) { } } if (defined $email) { + # Email::Address::XS may leave quoted '<' in addresses, + # which git-fast-import doesn't like + $email =~ tr/<>//d; + # quiet down wide character warnings with utf8::encode utf8::encode($email); } else { diff --git a/t/import.t b/t/import.t index e71dd714..b88d308e 100644 --- a/t/import.t +++ b/t/import.t @@ -55,6 +55,8 @@ $im->done; my @revs = $git->qx(qw(rev-list HEAD)); is(scalar @revs, 1, 'one revision created'); +my $odd = '"=?iso-8859-1?Q?J_K=FCpper?= <usenet"@example.de'; +$mime->header_set('From', $odd); $mime->header_set('Message-ID', '<[email protected]>'); $mime->header_set('Subject', 'msg2'); like($im->add($mime, sub { $mime }), qr/\A:\d+\z/, 'added 2nd message'); -- unsubscribe: one-click, see List-Unsubscribe header archive: https://public-inbox.org/meta/
