Re: Multiple Page Scrape

Anthony Ettinger Tue, 06 Jun 2006 08:59:01 -0700

Since it's native xml format, I would use XML::Simple to parse it into
a hash, then you can format however you want by looping through the
hash.


On 6/6/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

The script below scrapes a House of Representatives vote page which is in
xml and saves it in a spreadsheet which is best opened as an xls read
only.  How can I:

1) scrape multiple vote pages into individual spreadsheets with a single
script?

2) Only scrape columns C, F, G, H  in the result here?  I'd also prefer to
have the spreadsheet as a csv, but that doesn't work by just changing
*.xls to *.csv  Thanks in advance.

Ken

#!/bin/perl

use strict;
use warnings;

use WWW::Mechanize;

my $output_dir = "c:/training/bc";

my $starting_url = "http://clerk.house.gov/evs/2005/roll667.xml";;

my $browser = WWW::Mechanize->new();

$browser->get( $starting_url );

foreach my $line (split(/[\n\r]+/, $browser->content)) { print $line;}

open OUT, ">$output_dir/vote667.xls" or die "Can't open file:$!";

foreach my $line (split(/[\n\r]+/, $browser->content)) {

print OUT "$line";}

close OUT;



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>



--
Anthony Ettinger
Signature: http://chovy.dyndns.org/hcard.html

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: Multiple Page Scrape

Reply via email to