Your message dated Mon, 8 May 2006 16:37:47 -0400
with message-id <[EMAIL PROTECTED]>
has caused the Debian Bug report #365151,
regarding libmail-mbox-messageparser-perl: message splitting breaks
to be marked as having been forwarded to the upstream software
author(s) Eduard Bloch <[EMAIL PROTECTED]>, [EMAIL PROTECTED], David Coppit
<[EMAIL PROTECTED]>.
(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere. Please contact me immediately.)
Debian bug tracking system administrator
(administrator, Debian Bugs database)
--- Begin Message ---
Eduard Bloch wrote:
> my program mail-expire uses this module to split mbox files into
> individual messages. Sometimes, however, the end of file is reported too
> early and data is _lost_ because of that. I did not try to investigate
> the issue yet, test data is in:
> http://people.debian.org/~blade/debian-user-german.Apr_2006.bz2
> and the current version of the script is attached, with debugging output
> enabled. If you look at that, it stops splitting the contents at <[EMAIL
> PROTECTED]> and returns the rest as one big message.
Looks like the problem here is the mime boundary header parsing. The header
looks like this:
Content-Type: multipart/signed; boundary=Sig_vBdOhvW1OXTFVp5Uz7Tcu_+;
protocol="application/pgp-signature"; micalg=PGP-SHA1
Note the lack of quotation of the boundary string. The library parses it
with this:
# Are nonquoted parameter values allowed to have spaces? I assume not.
if ($content_type_header =~ /boundary *= *"([^"]*)"/i ||
$content_type_header =~ /boundary *= *\b(\S+)\b/i)
This matches "Sig_vBdOhvW1OXTFVp5Uz7Tcu_" out of the string, leaving off
the "+" at the end. This doesn't conform to RFC 2046 which allows
boundary to contain:
boundary := 0*69<bchars> bcharsnospace
bchars := bcharsnospace / " "
bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
"+" / "_" / "," / "-" / "." /
"/" / ":" / "=" / "?"
(And yes, even nonquoted spaces are legal AFAICS..)
This should work better, it passes the test suite and successfully
parses the mailbox from this bug report.
Index: Grep.pm
===================================================================
--- Grep.pm (revision 12420)
+++ Grep.pm (working copy)
@@ -177,9 +177,8 @@
my $content_type_header = $1;
$content_type_header =~ s/$endline//g;
- # Are nonquoted parameter values allowed to have spaces? I assume not.
if ($content_type_header =~ /boundary *= *"([^"]*)"/i ||
- $content_type_header =~ /boundary *= *\b(\S+)\b/i)
+ $content_type_header =~ /boundary *= *([-0-9A-Za-z'()+_,.\/:=?
]*[-0-9A-Za-z'()+_,.\/:=?])/i)
{
return $1
}
Index: Perl.pm
===================================================================
--- Perl.pm (revision 12420)
+++ Perl.pm (working copy)
@@ -248,9 +248,8 @@
my $content_type_header = $1;
$content_type_header =~ s/$endline//g;
- # Are nonquoted parameter values allowed to have spaces? I assume not.
if ($content_type_header =~ /boundary *= *"([^"]*)"/i ||
- $content_type_header =~ /boundary *= *\b(\S+)\b/i)
+ $content_type_header =~ /boundary *= *([-0-9A-Za-z'()+_,.\/:=?
]*[-0-9A-Za-z'()+_,.\/:=?])/i)
{
return $1
}
--
see shy jo
signature.asc
Description: Digital signature
--- End Message ---