On Friday 07 April 2006 13:15, [EMAIL PROTECTED] wrote:
> I'm trying to learn web scraping and am stopped at the basic point of
> scraping a portion
> of a web page.  I'm able to scrape a full page and save it as *.xml or
> *.htm, and I think
> I understand regex, but the following fails:
>
>
> **************
> # Prints a portion of a red cross web page to a new htm file.
>
> use strict;
>
> use warnings;
>
> use LWP::Simple;
>
> use WWW::Mechanize;
>
> my $url =
>
> 'http://www.redcrossnca.org/ServiceCenters/montgomery.php3';
>
> getstore( $url, 'c://redcross.htm' );
>
> open PAGE, 'c://redcross.htm';
> while( my $line = <PAGE> ) {
> $line =~ /Health and Safety Classes/
> print "$1\n";
> }
>
> close PAGE;
> ********
>
> Once I get the syntax straight I'll go after more detailed scrapes.
>
> Ken

Have you looked into HTML::TokeParser. It might help you
in your web scraping needs.  You can read a great article by
Chris Ball at:

http://www.perl.com/pub/a/2003/01/22/mechanize.html

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to