You could could also start with the 'h2mbx.pl' script 

  http://www.albany.net/~anthonyw/archivedemo/script.txt

  http://www.albany.net/~anthonyw/archivedemo/


and modify it to parse your html files.


On 26 Apr 2000, [ISO-8859-1] Fran�ois Pinard wrote:

> Louis Proyect <[EMAIL PROTECTED]> writes:
> 
> > Has anybody written a perl script to convert mhonarc msg html to
> > standard Internet RSC mailbox format?  I want to add old archives to
> > the mail-archive website, but neglected to save the mailbox data that
> > created them originally.
> 
> I made the following script for one particular case, but since MHonArc
> is incredibly configurable, there is little chance for the script to
> work generally.  But it might help you at getting started, who knows...
> 
> To use it, I called a recursive `wget' on the archives, and from within
> the directory, did `unmhonarc * > ../FOLDER' to produce a single big FOLDER
> containing all the archives.  Then, I digested that folder from within Gnus,
> and had fun for a good while, sorting out all the information!
> 
> The following script is put in an executable file named `unmhonarc',
> as you guessed already :-).
> 
> 
> #!/usr/bin/env python
> # Rebuild simple messages from their HTML expression.
> 
> import string, sys
> 
> def main(*arguments):
>     for file in arguments:
>         sys.stderr.write("Processing %s ...\n" % file)
>         lines = open(file).readlines()
>         sys.stdout.write('From nobody@nowhere  Sun Feb 13 06:46:37 2000\n')
>         for counter in range(len(lines)):
>             if lines[counter][0:4] == '<li>':
>                 break
>         write_clean(lines[counter][4:])
>         counter = counter + 1
>         write_clean(lines[counter][4:])
>         counter = counter + 1
>         write_clean(lines[counter][4:])
>         counter = counter + 1
>         sys.stdout.write('Message-Id: <[EMAIL PROTECTED]>\n' % file)
>         sys.stdout.write('\n')
>         while counter < len(lines):
>             if lines[counter] == '<PRE>\n':
>                 break
>             counter = counter + 1
>         counter = counter + 1
>         while counter < len(lines):
>             if lines[counter] == '</PRE>\n':
>                 break
>             write_clean(lines[counter])
>             counter = counter + 1
>         sys.stdout.write('\n')
>         sys.stdout.write('\n')
> 
> def write_clean(line):
>     line = string.replace(line, '&lt;', '<')
>     line = string.replace(line, '&gt;', '>')
>     line = string.replace(line, '&amp;', '&')
>     sys.stdout.write(line)
> 
> if __name__ == '__main__':
>     apply(main, tuple(sys.argv[1:]))
> 
> -- 
> Fran�ois Pinard   http://www.iro.umontreal.ca/~pinard
> 
> 
> 

Regards, 

AnthonyW

Reply via email to