Re: [Boston.pm] stripping

darren chamberlain Fri, 14 Sep 2001 06:37:57 -0700

[EMAIL PROTECTED] <[EMAIL PROTECTED]> said something to this effect on 
09/13/2001:
> I'd like to turn HTML files into text files. Right now I use a simple
> regex [s/<[^>]+>//gs], but I'm sure there are many things that will fall
> through the cracks [including charachter entities].
> 
> I feel sure I am not the first one to have this problem. Does anyone
> know of a module or some other resource for doing this? A CPAN review
> was fruitless [but maybe I just missed it].

The easiest way would be:

$without_html = `lynx -dump -nolist $url`;

There is also Tom Christensen's striphtml at
<http://www.cpan.org/authors/id/T/TO/TOMC/scripts/striphtml.gz>.

(darren)

-- 
I have discovered that all human evil comes from this, man's being
unable to sit still in a room.
    -- Blaise Pascal

Re: [Boston.pm] stripping

Reply via email to