Gary Kline wrote:
On Tue, May 15, 2007 at 03:34:14PM +1000, Ian Smith wrote:
On Sat, 12 May 2007 14:34:52 -0700 Gary Kline <[EMAIL PROTECTED]> wrote:
 > On Mon, May 14, 2007 at 12:09:07PM -0700, Chuck Swiger wrote:
 > > On May 12, 2007, at 12:54 PM, Gary Kline wrote:
 > > >This is for those of us who appreciate ASCII or straight
 > > > ISO_8859-15 rather than marked up files.  I have slapped together
 > > > a crude C program that does scotch (or *cleanse*) text of
 > > > <B></B> and so on.   Still... is there some standalone converter
 > > > that gets rids of markup more elegantly?   Something where i
 > > > can say
 > > >
 > > > % cmd file_1.html ... file_N.html and output file_1.text ...
 > > > file_N.text?
> > > > Perhaps: > > > > lynx -dump file1.html ... > file.text > > > > ...? > > Hm, maybe Ineed Bill Campbell's -force_html switch. > > Yes, seems that way. USing just -dump got most of them, but
 >   using the -force_html caught all.  Need to script something to
 >   reformat, but the worst of it's done!

Also, if using Mozilla (so, I would assume, Firefox) the 'Save Page As'
dialog offers a picklist for 'Files of Type' that includes 'Text Files'.

This does a pretty decent job of producing text from HTML files, and is
quicker than firing up lynx (or links) if you're already viewing a page.

        Oh sure; I've been saving html in text, ascii/8859-1 for years.
        But what I've got, and there are more saved **somewhere**, are
        files that are saved by default in markup.  I have a slew of
these on different boxen and have been moving then to one place. Problem is: how to de-html the bunch.
        I'm too lazy to write something that would automate what Can be
automated--markup like "&foo;" are problematic. So probably the easiest way would be to create a script that is just a wrapper around lynx.
        I don't think I'm the only hacker who wants just-plain-ascii, so
        this might mak a good project for somebody who's new to C or
        perl.   That's my two pennies' worth!


Cheers, Ian

If you don't want formatting and the number of tags is trivial, the solution is fairly simple in Perl (less than 150 lines, if even that).


_______________________________________________ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to