Help with Mechanize

William Jones Mon, 15 Jan 2007 16:54:03 -0800
Hello,
 
I could use some help with Mechanize and Andy Lester recommended I post
an email on the libwww mailing list.  I am trying to do what should be a
simple scrape of the us patent and trademark website for bibliographic
info that they post for all patents.  Unfortunately I keep getting
re-routed to a page that says 
 
"We are unable to display the requested information. Please note that
all requests must be made using this form."
 
Do you think I am out of luck or are there some things I can try?  The
form that is used to request the patent info does have the following
javascript line:
 
<script language="JavaScript" type="text/javascript">
  <!--
    document.forms["mfInputForm"].elements["patentNum"].focus()
  // -->
</script>
 
Basically, I am wondering how the website could know that I am using
mechanize and not internet explorer to enter the info into the fields
and click "submit."
 
 
Here is my perl code.  Thanks.
 
 
#!/usr/local/bin/perl -w
print "Content-type: text/html\n\n";
use strict;
use WWW::Mechanize;
use Crypt::SSLeay;
my $url = "https://ramps.uspto.gov/eram/";;
my $maintenancepatent = "5771669";
my $maintenanceapp = "08672157";
my $outfile = "out.htm";
my $mech = WWW::Mechanize->new( autocheck => 1);
$mech->proxy(['https'], '');
$mech->get($url);
$mech->follow_link(text => "Pay or Look up Patent Maintenance Fees", n
=> 1);
$mech->form_name('mfInputForm');
$mech->field(patentNum => "$maintenancepatent");
$mech->field(applicationNum => "$maintenanceapp");
$mech->add_header( Referer => $url ); 
$mech->click_button (number => 2);
open(OUTFILE, ">$outfile");
my $output_page = $mech->content();
print OUTFILE "$output_page";
close(OUTFILE);
print "done";
Help with Mechanize

Reply via email to