On Mon, Oct 29, 2001 at 10:00:23PM -0600, ADJE WebMail Technical Support Team wrote:
> Question: How do I extract the plain text from an HTML file, or, put
> another way, how do I remove the html markups, just leaving the plain
> text? I have looked at the example provided in HTML::Parser, in
> particular
>
> HTML-Parser-3.25/eg/htext
>
> which comes close to what I need, however, I would like to store the
> plain text in a variable, as opposed to having it to STDOUT (standard
> output).... any ideas??
Try
perl -MLWP::Simple -MHTML::TreeBuilder \
-e 'my $text =HTML::TreeBuilder->new' \
-e '->parse(LWP::Simple::get("http://www/"))->as_text;' \
-e 'print $text'
You probably want to improve on it.
--
Reinier