Hi Bret,

Thanks for the swift reply.

I was aware of the deficiency of Activestate's PPM, having had the same
problem not finding Crypt-SSLeay.  However, I know that's not the solution in
itself since I had tried my scripts on a Linux box which had this installed.
I also turned the cookie jar on (not knowing what it was doing particularly)
and that did not get me any further.  

The advice you give about proxy servers looks very interesting though, so I
shall download SSLeay (am now writing these scripts on Windows, mainly
because I find using IE is the easiest way to develop web scraping scripts
concurrently with the Perl script rather than any advantage of Perl on
Windows) and try that.

Regards
Colin 


-----Original Message-----
From: Bret Swedeen [mailto:[EMAIL PROTECTED]
Sent: 15 July 2004 13:09
To: [EMAIL PROTECTED]; Colin Magee
Subject: Re: Viewing exchange between browser and website


Hi Colin,

I recently attempted a similar task.  I'll try to outline as clearly as
possible what 
worked.

Using some of the examples from "Perl & LWP" book was a disaster.  None of it

worked.  One important point I did pick up however was to make sure you have 
cookies enabled:

use HTTP::Cookies;
$agent->cookie_jar(HTTP::Cookies->new());

Once I added those lines things started to look promising. Unfortunately, I
was still 
having problems.  Primary reason:  I needed an extra Perl Mod to make 
communication across https possible.  I'm using Perl for Win32 so I needed to

install the Perl Mod Crypt::SSLeay.  Problem was doing so from the ppm prompt

(part of the ActiveState Perl installation...makes mod installation very
easy) wasn't 
working.  For whatever reason I couldn't find Crypt::SSLeay for Perl on
Win32.  
Finally, after searching forums on ActiveState I found the mod and installed
from 
the ppm prompt with the following command:

install http://theoryx5.uwinnipeg.ca/ppms/Crypt-SSLeay.ppd

Take the defaults through the entire installation (there are a couple of DDLs
that it 
will ask you about as well.  Just answer yes).

Ok, now I'm getting real close, but still not working.  I posted on the
Usenet forum 
for Perl Mods and got two extremely helpful tips.

First, install a local proxy of sorts to capture and view the back and forth 
communication between browser and web site.  Something I think you are
looking 
for now.  Proxomitron was what I used.  I turned it on and went through the
web 
interaction steps with a standard browser.  While this tool didn't really
resolve my 
problems, it did help me understand more of what was going on between the 
browser and the site.

Second, and the most helpful of all, install the Perl Mod WWW::Mechanize.
This 
mod allows you to easily automate the steps of interacting with a site.  From
simply 
following links to logging on and communicating over https.  This mod was
what 
finally worked for me.  There was a problem with pressing certain buttons on
the 
page.  Seems it doesn't really know what to do with Javascript buttons, but I
worked 
around that by simply making a URL with all of the form variables set and
passed it 
in to get what I wanted.  May not be a problem for you, but keep in mind that
it 
really doesn't work with all form buttons exactly as you might think.

Anyway, another very useful thing during script development is to turn on the
LWP 
debugging.  With this turned on you get to see all of the communication
details 
between your script and the site.  It really helps with troubleshooting as
you can see 
exactly where things are falling apart.  Add this line near the top of your
script after 
the use LWP statement.

use LWP::Debug qw(+);

Anyway, my experience was somewhat frustrated but little by little I did make

progress and finally resolve my problem.  Here is a quick glimpse at what I
put 
together.  Please keep in mind I had to remove some detail as it is company 
specific which I cannot disclose here.  Also, at the end I dump the page
content that 
I get back after I send $bigprobeurl into a file with an html extension.  I
would then 
open this file in a browser to see if I got what I wanted.  The final version
removes 
some of this code and actually acts upon the page returned.  I believe,
however, 
this example should help get you closer to what you want.  Of course, as I
found, no 
one example addresses your problem exactly they way you need.  Keep working
on 
it.  You'll get there in the end.

use LWP;
use LWP::UserAgent;
use LWP::Protocol::https;
use LWP::Debug qw(+);
use WWW::Mechanize;
use HTTP::Cookies;

my $agent = WWW::Mechanize->new();
my $intranetsite = "http://some company intranet site/index.html";
my $bigurl = "https://big url here with form variables and their values";

$numargs = @ARGV; # check for username and password on the command line
if ($numargs == 2) {
        $un     = $ARGV[0];
        $pw = $ARGV[1];
}
else {
        print "Please enter your username: ";
        my $un = <STDIN>;
        chomp($un);
        print "Please enter your password: ";
        my $pw = <STDIN>;
        chomp($pw);
}
        
$agent->cookie_jar(HTTP::Cookies->new());
$agent->agent_alias( 'Windows IE 6' );

#Navigate the intranet web site
$agent->get($intranetsite);
$agent->follow("Sign In"); # a link on the page
$agent->form_name('login'); # this is the name of the form on the sign in
page
$agent->field(username => "$un");
$agent->field(password => "$pw");
$agent->click(); # this is where I simulate clicking the button on the login
page
$agent->follow("Internal Application Link"); # a link on the new page
$agent->follow("Application Charts"); # a link on the next new page
$agent->get($bigurl); # finally, I send the URL wth form variables and values


open(LOGFILE, ">output.html");
$page = $agent->content();
print LOGFILE "$page"; # dump page content into a file for viewing in a
browser
close(LOGFILE);
__END__


On 15 Jul 2004 at 12:14, Colin Magee wrote:

> Hi,
> 
> I've been trying to use LWP to programatically log in to a favourite
> password protected website.  
> 
> Problem is that I've worked through all the standard examples on LWP
> and I'm not getting through - the login mechanism doesn't conform to
> the examples, so I was wondering if there is any way I can see exactly
> what my browser is sending and receiving (while I'm using the browser)
> and therefore what I have to replicate in the code.  As you can
> probably tell I'm fairly novicey so I need to see some output where it
> will be fairly clear what I have to code in Perl.  I seem to recall
> some thread on this forum about using Mechanise in this way.  Is that
> correct?  If so is there an example script that shows how to record
> this?
> 
> Thanks
> Colin
> 


Reply via email to