[EMAIL PROTECTED] <[EMAIL PROTECTED]> said something to this effect on
09/13/2001:
> I'd like to turn HTML files into text files. Right now I use a simple
> regex [s/<[^>]+>//gs], but I'm sure there are many things that will fall
> through the cracks [including charachter entities].
>
> I feel sure I am not the first one to have this problem. Does anyone
> know of a module or some other resource for doing this? A CPAN review
> was fruitless [but maybe I just missed it].
The easiest way would be:
$without_html = `lynx -dump -nolist $url`;
There is also Tom Christensen's striphtml at
<http://www.cpan.org/authors/id/T/TO/TOMC/scripts/striphtml.gz>.
(darren)
--
I have discovered that all human evil comes from this, man's being
unable to sit still in a room.
-- Blaise Pascal