Re: HTML::Parser - Extracting out the text from

Reinier Post Mon, 09 Jul 2001 06:04:45 -0700

On Mon, Jul 02, 2001 at 11:17:00AM -0700, Bill Moseley wrote:
> Hello,
> 
> I need to extract text out of html docs to do search word highlighting in
> context.  (You know, like google's output.)
> 
> So, is there a "fastest" method to do this -- better than just using
> HTML::Parser, setting a flag when I catch <body> and then storing the text?

If 'fastest' means 'most convenient', try

  perl -MLWP::Simple -MHTML::TreeBuilder -e \
    'print HTML::TreeBuilder->new->parse(LWP::Simple::get("http://www/";))->as_text'

-- 
Reinier

Reply via email to