Re: HTML::Parser question

Reinier Post Tue, 30 Oct 2001 13:48:01 -0800

On Mon, Oct 29, 2001 at 10:00:23PM -0600, ADJE WebMail Technical Support Team wrote:
> Question: How do I extract the plain text from an HTML file, or, put
> another way, how do I remove the html markups, just leaving the plain
> text?  I have looked at the example provided in HTML::Parser, in
> particular
> 
> HTML-Parser-3.25/eg/htext
> 
> which comes close to what I need, however, I would like to store the
> plain text in a variable, as opposed to having it to STDOUT (standard
> output).... any ideas??


Try

  perl -MLWP::Simple -MHTML::TreeBuilder \
    -e 'my $text =HTML::TreeBuilder->new' \
    -e '->parse(LWP::Simple::get("http://www/";))->as_text;' \
    -e 'print $text'

You probably want to improve on it.

-- 
Reinier

Re: HTML::Parser question

Reply via email to