On August 12, 2006 at 13:28, "Jeff Breidenbach" wrote:

> The majority of mbox files I've been handed do not escape "From" like
> they should, and this causes problems on M-A's end; inc from the nmh
> suite gets unhappy and starts trashing messages. Are there any
> recommendations for an mbox2mbox converter that will clean up
> these wayward almost-but-not-quite-mbox files?

Depends on how the bogus "From" lines are structured.

In mhonarc, the MSGSEP resource can be set to provide a stricter
check, which generally gets around most cases of unescaped "From "s.

For your case, a simple Perl script can be used to do what you
want.  Maybe something like:

  #!/usr/bin/perl
  my $msgsep =
    qr/^From\s+(?:"[^"]+"@\S+|\S+)\s+\S+\s+\S+\s+\d+\s+\d+:\d+:\d+\s+\d+/;
  while (<>) {
    if (!/^From / || !/$msgsep/) {
      print STDOUT $_;
      next;
    }
    print STDOUT '>'.$_;
  }

If you call the above "escapefrom", invoke like the following:

  escapefrom mbox > escaped-mbox

Then run a diff to see how well it worked.

The main limitation is when messages include mbox from lines
in their bodies unescaped.  In this case, it requires a human to
determine if the line indicates a new message of it is part of
an existing one.

If your MDA creates a "From " line that is unique to your site, you
can modify the above regex to just match that.

--ewh

_______________________________________________
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip

Reply via email to