Volker,

A couple of days late but ....
First less is rather clever at viewing (and saving) html files these days -
have you looked at or tried it for doing this?

secondly there is html2text and w3m.
I have used and found both of these tools to be alright - for my purposes
anyway.

html2text
is a command line utility, written in C++, that converts HTML
documents into plain text.
homepage:
http://userpage.fu-berlin.de/~mbayer/tools/html2text.html

bug page:
http://userpage.fu-berlin.de/~mbayer/tools/problems_html2text.txt
NB: please read the caveat re gcc versions on this page.

download of most recent:
html2text version 1.3.1 (stable, released 2002-09-02)
http://userpage.fu-berlin.de/~mbayer/tools/html2text.html#download

w3m
is like links & lynx and *has* a dump option(with controllable col width)
which is of high repute - well it has been quite highly regarded within
the ldp[1] for instance - though current opinion may differ.

english homepage -
http://w3m.sourceforge.net/

thirdly other beasts of similar function though of less recent vintage 
may be found at:
http://www.ibiblio.org/pub/Linux/apps/www/converters/!INDEX.html

now last off is Vilistextum which i have never used and only mention
because i know that a lot of european mutt users are fond of it.
homepage is at - 
http://www.mysunrise.ch/users/bhaak/vilistextum/

goodluck & cheers
peter

[1] the linux documentation project

=================================================
On Sat, 30 Aug 2003 23:27:45 +1200
Volker Kuhlmann <[EMAIL PROTECTED]> wrote:

> What do people use to convert html pages to a legible formatted text
> representation? I find that netscape 4.[78] is by far the best (save as
> text), I can't recall it having ever let me down. Occasionally, lynx
> -dump and html2text produce better results, but frequently both of them
> also produce downright rubbish (it all depends on the particular page).
> Mozilla unfortunately didn't copy the function form netscape 4, and
> produces output no better than that of html2text/lynx, but the latter 2
> are considerably less bloated (understatement).
> 
> I expect netscape 4 to be dropped any time, and I'd prefer a command
> line solution. Is there anything better than netscape 4?
> 
> Thanks,
> 
> Volker
> 
> -- 
> Volker Kuhlmann                       is possibly list0570 with the domain in header
> http://volker.dnsalias.net/           Please do not CC list postings to me.
> 

Reply via email to