It is fairly straightforward to write such a utility to go from html back
to 'mbox'. Remember hyp2mbox.pl
<http://www.albany.net/~anthony/archivedemo> from sometime ago? That same
script can be used as a starting point for making a mhonarc2mbox script.

 While Erik's original question is over two weeks old by now, I too have
have a project that requires me to morph hyp2mbox.pl into this new
utility. Which brings me to a feature request.

 For the msg*.html files that are derived from multipart messages:

1) Save the boundary markers of the message part 
2) Retain the Mime headers of the message parts - those which are not
   displayed inline - inside of some X- comment.


Something like:

  <!--X-MsgBody-->
    <!--X-Subject-Header-Begin-->
    ... expanded value of SUBJECTHEADER resource ...
    <!--X-Subject-Header-Begin-End-->
    <!--X-Head-of-Message-->
    ... converted message header fields ...
    <!--X-Head-of-Message-End-->
    <!--X-Head-Body-Sep-Begin-->
    ... expanded value of HEADBODYSEP resource ...
    <!--X-Head-Body-Sep-End-->
    <!--X-Body-of-Message-->
    ... converted message body ...
    <!--X-Body-Part-Begin: goobly-gook-uniquestring -->
    <!--X-Content-type: Type/Subtype; name=some.doc.name -->
    <!--X-Content-disposition: attachment ; filename=some.doc.name -->
    <!--X-Content-transfer-encoding: whatever -->
    <P><A HREF="bin00002.bin" >Some.file.name</A></P>
    <!--X-Body-Part-End: goobly-gook-uniquestring -->
    <!--X-Body-Part-Begin: goobly-gook-uniquestring -->
    <!--X-Content-type: Type/Subtype; name=some.other-doc.name -->
    <!--X-Content-disposition: attachment; filename=some.other-doc.name -->
    <!--X-Content-transfer-encoding: whatever -->
    <P><A HREF="bin00002.bin" >Some.other-doc.name</A></P>
    <!--X-Body-Part-End: goobly-gook-uniquestring -->
    <!--X-Body-of-Message-End-->
  <!--X-MsgBody-End-->
  


On Tue, 24 Oct 2000, Earl Hood wrote:

> On October 24, 2000 at 11:19, Erik Rossen wrote:
> 
> > OK, I guess I have a job to do.  I'm kind of surprised that noone has ever
> > asked for this capability before, though.  I guess everyone (except me) is
> 
> It has been brought up before.  But maybe only once or twice.
> 
> > smart enough to keep the original mbox around just in case.  I hope that
> > the people over at www.mail-archive.com are doing that in case VA Linux
> > ever decides to pull the plug.  They are managing over 5,000 mailing lists
> > with Mhonarc!
> 
> There is a mailing list associated with the maintenance of
> www.mail-archive.com: [EMAIL PROTECTED]  I have also meant with the
> maintainer of the site.  MH (the MUA) is used as the core manager of
> mail and a heuristic filter is used to determine which archive a
> message goes to.  I do believe original mail is stored since
> archives have been regenerated.
> 
> > > As a start, look at mhmsgfile.pl that is part of the MHonArc distribution
> > > (used by mha-dbrecover).
> > 
> > Thanks, that was a good starting point for extracting header info from the
> > messages.  Is there also a subroutine for extracting bodies?  If so, I
> > hardly need to do any work! :-)
> 
> All you need to know is the special comment declarations used to
> delimit the message body data.  In later versions of MHonArc,
> more comment declarations were created to provide better granularity
> on delimiting the types of data on a message page.  Here is the
> structure of the comments associated with the actual message data
> on a message page:
> 
>   <!--X-MsgBody-->
>     <!--X-Subject-Header-Begin-->
>     ... expanded value of SUBJECTHEADER resource ...
>     <!--X-Subject-Header-Begin-End-->
>     <!--X-Head-of-Message-->
>     ... converted message header fields ...
>     <!--X-Head-of-Message-End-->
>     <!--X-Head-Body-Sep-Begin-->
>     ... expanded value of HEADBODYSEP resource ...
>     <!--X-Head-Body-Sep-End-->
>     <!--X-Body-of-Message-->
>     ... converted message body ...
>     <!--X-Body-of-Message-End-->
>   <!--X-MsgBody-End-->
> 
> Messages converted with earlier versions of MHonArc will not have
> all the comment declarations above.  I'd have to review past releases
> to verify what was generated.
> 
> 
> > POSSIBLE BUG:
> > 
> > In glancing through the file mhtxtenrich.pl, I ran across the following
> > line of code (line 58 of mhtxtenrich.pl 2.3 99/06/25 14:18:01):
> > 
> >     $data =~ s|<<|\&lt;|gi;
> > 
> > Didn't you mean 
> > 
> >     $data =~ s|<|\&lt;|gi;
> 
> No.  text/enriched is similiar syntactically as HTML, but there are
> differences.  To get a literal "<" to show up on text, you use
> "<<".  Check the RFC for text/enriched for more details.
> 
> > By the way, what revision control system are you using?  I've only used
> > CVS before and I was wondering about the @(#) prefix in the file IDs.
> 
> I use SCCS (with a custom Perl front-end I wrote to handle multiple
> directories): an older source code control system that exists on Unix
> systems.  SCCS is at the same level as RCS, just the management of
> individual files.  The "@(#)" is a marker for the `what' command for
> extracting version information from programs.  It is/was common in C
> programs to do the following:
> 
>   static const char sccs_id[] = "@(#) <some version info>";
> 
> for each source file.  So a person could do "what
> <program/library-filename>" to get the version information for all
> source files associated with the program.
> 
> If using SCCS, the static declaration may look like:
> 
>   static const char sccs_id[] = "%Z% %M% %I% %E% %U%";
> 
> Since many now use other types of source code management tools
> (like RCS, CVS, etc), one typically does:
> 
>   static const char sccs_id[] = "@(#) $Id:$";
> 
> or something similiar.  This is common for commercial-based programs
> since commercial Unix OSs have SCCS as part of the base OS, including
> the `what' command.  Note, the what command can be emulated as
> follows:
> 
>   strings <file> | grep "@(#)"
> 
> When I finally got my own PC and Linux installed, I was almost forced
> to move to something like CVS/RCS since SCCS does not come with Linux
> distributions.  However, I was able to find a free implementation of
> SCCS that I could build so I avoided migrated my source to a new
> system.
> 
> In the long term, moving to CVS will be better since it does have
> better management capabilities than my simple Perl front-end to
> SCCS has.  I just have not bothered to do it yet.
> 
> BTW, if you are wondering, I used SCCS since I was not familiar
> with CVS, and at the time, was the only common version control system
> that I had access to.  Much of MHonArc's development occured on
> commercial Unix OS's before I ever started using Linux.
> 
> --ewh
> 
> 

Regards, 

AnthonyW


Reply via email to