It is fairly straightforward to write such a utility to go from html back
to 'mbox'. Remember hyp2mbox.pl
<http://www.albany.net/~anthony/archivedemo> from sometime ago? That same
script can be used as a starting point for making a mhonarc2mbox script.
While Erik's original question is over two weeks old by now, I too have
have a project that requires me to morph hyp2mbox.pl into this new
utility. Which brings me to a feature request.
For the msg*.html files that are derived from multipart messages:
1) Save the boundary markers of the message part
2) Retain the Mime headers of the message parts - those which are not
displayed inline - inside of some X- comment.
Something like:
<!--X-MsgBody-->
<!--X-Subject-Header-Begin-->
... expanded value of SUBJECTHEADER resource ...
<!--X-Subject-Header-Begin-End-->
<!--X-Head-of-Message-->
... converted message header fields ...
<!--X-Head-of-Message-End-->
<!--X-Head-Body-Sep-Begin-->
... expanded value of HEADBODYSEP resource ...
<!--X-Head-Body-Sep-End-->
<!--X-Body-of-Message-->
... converted message body ...
<!--X-Body-Part-Begin: goobly-gook-uniquestring -->
<!--X-Content-type: Type/Subtype; name=some.doc.name -->
<!--X-Content-disposition: attachment ; filename=some.doc.name -->
<!--X-Content-transfer-encoding: whatever -->
<P><A HREF="bin00002.bin" >Some.file.name</A></P>
<!--X-Body-Part-End: goobly-gook-uniquestring -->
<!--X-Body-Part-Begin: goobly-gook-uniquestring -->
<!--X-Content-type: Type/Subtype; name=some.other-doc.name -->
<!--X-Content-disposition: attachment; filename=some.other-doc.name -->
<!--X-Content-transfer-encoding: whatever -->
<P><A HREF="bin00002.bin" >Some.other-doc.name</A></P>
<!--X-Body-Part-End: goobly-gook-uniquestring -->
<!--X-Body-of-Message-End-->
<!--X-MsgBody-End-->
On Tue, 24 Oct 2000, Earl Hood wrote:
> On October 24, 2000 at 11:19, Erik Rossen wrote:
>
> > OK, I guess I have a job to do. I'm kind of surprised that noone has ever
> > asked for this capability before, though. I guess everyone (except me) is
>
> It has been brought up before. But maybe only once or twice.
>
> > smart enough to keep the original mbox around just in case. I hope that
> > the people over at www.mail-archive.com are doing that in case VA Linux
> > ever decides to pull the plug. They are managing over 5,000 mailing lists
> > with Mhonarc!
>
> There is a mailing list associated with the maintenance of
> www.mail-archive.com: [EMAIL PROTECTED] I have also meant with the
> maintainer of the site. MH (the MUA) is used as the core manager of
> mail and a heuristic filter is used to determine which archive a
> message goes to. I do believe original mail is stored since
> archives have been regenerated.
>
> > > As a start, look at mhmsgfile.pl that is part of the MHonArc distribution
> > > (used by mha-dbrecover).
> >
> > Thanks, that was a good starting point for extracting header info from the
> > messages. Is there also a subroutine for extracting bodies? If so, I
> > hardly need to do any work! :-)
>
> All you need to know is the special comment declarations used to
> delimit the message body data. In later versions of MHonArc,
> more comment declarations were created to provide better granularity
> on delimiting the types of data on a message page. Here is the
> structure of the comments associated with the actual message data
> on a message page:
>
> <!--X-MsgBody-->
> <!--X-Subject-Header-Begin-->
> ... expanded value of SUBJECTHEADER resource ...
> <!--X-Subject-Header-Begin-End-->
> <!--X-Head-of-Message-->
> ... converted message header fields ...
> <!--X-Head-of-Message-End-->
> <!--X-Head-Body-Sep-Begin-->
> ... expanded value of HEADBODYSEP resource ...
> <!--X-Head-Body-Sep-End-->
> <!--X-Body-of-Message-->
> ... converted message body ...
> <!--X-Body-of-Message-End-->
> <!--X-MsgBody-End-->
>
> Messages converted with earlier versions of MHonArc will not have
> all the comment declarations above. I'd have to review past releases
> to verify what was generated.
>
>
> > POSSIBLE BUG:
> >
> > In glancing through the file mhtxtenrich.pl, I ran across the following
> > line of code (line 58 of mhtxtenrich.pl 2.3 99/06/25 14:18:01):
> >
> > $data =~ s|<<|\<|gi;
> >
> > Didn't you mean
> >
> > $data =~ s|<|\<|gi;
>
> No. text/enriched is similiar syntactically as HTML, but there are
> differences. To get a literal "<" to show up on text, you use
> "<<". Check the RFC for text/enriched for more details.
>
> > By the way, what revision control system are you using? I've only used
> > CVS before and I was wondering about the @(#) prefix in the file IDs.
>
> I use SCCS (with a custom Perl front-end I wrote to handle multiple
> directories): an older source code control system that exists on Unix
> systems. SCCS is at the same level as RCS, just the management of
> individual files. The "@(#)" is a marker for the `what' command for
> extracting version information from programs. It is/was common in C
> programs to do the following:
>
> static const char sccs_id[] = "@(#) <some version info>";
>
> for each source file. So a person could do "what
> <program/library-filename>" to get the version information for all
> source files associated with the program.
>
> If using SCCS, the static declaration may look like:
>
> static const char sccs_id[] = "%Z% %M% %I% %E% %U%";
>
> Since many now use other types of source code management tools
> (like RCS, CVS, etc), one typically does:
>
> static const char sccs_id[] = "@(#) $Id:$";
>
> or something similiar. This is common for commercial-based programs
> since commercial Unix OSs have SCCS as part of the base OS, including
> the `what' command. Note, the what command can be emulated as
> follows:
>
> strings <file> | grep "@(#)"
>
> When I finally got my own PC and Linux installed, I was almost forced
> to move to something like CVS/RCS since SCCS does not come with Linux
> distributions. However, I was able to find a free implementation of
> SCCS that I could build so I avoided migrated my source to a new
> system.
>
> In the long term, moving to CVS will be better since it does have
> better management capabilities than my simple Perl front-end to
> SCCS has. I just have not bothered to do it yet.
>
> BTW, if you are wondering, I used SCCS since I was not familiar
> with CVS, and at the time, was the only common version control system
> that I had access to. Much of MHonArc's development occured on
> commercial Unix OS's before I ever started using Linux.
>
> --ewh
>
>
Regards,
AnthonyW