matthew wickline wrote:

ChaosMK2 <[EMAIL PROTECTED]> wrote:
Inbetween another not important page is shown
that indicates that the server is working on
my request. My problem is how to ignore that
intermediate page but get that last important
resultpage.


You'll need to look at how the intermediate page works. Chances are that
it either uses javascript or a meta tag to do the redirect/refresh every
so often and once the result is ready that URL gives real results instead
of another intermediate page.

You'll want to parse out that redirect/refresh target (or, assuming the
URL is static, just get it off the mech object) and keep trying that URL
until the content no longer resembles the intermediate page (presumably
with a polite sleep() between requests). Once the content no longer looks
like the intermediate page, it should hopefully be your final results.

-matt


Thank you very much for your answer Matt but the problem persists. Here come the three intermediate pages:

Page1:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
   "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; lang="en-US"><head><title>Tcoffee monitoring</title>
</head><body bgcolor="#F6F6FF">Processing, please wait...</body></html>

Page2:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
   "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; lang="en-US"><head><title>Tcoffee monitoring</title> </head><body bgcolor="#F6F6FF">Processing, please wait....<CENTER><IMAGE src=/Tcoffee/Images/l5.gif></CENTER>time:13 seconds<br></body></html>

Page3:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
   "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; lang="en-US"><head><title>Tcoffee monitoring</title>
</head><body bgcolor="#F6F6FF">Your job is finished</body></html>

The code that I have added following your instructions:

while($mech->success() and $mech->title() eq "Tcoffee monitoring")
           {
           print $mech->title(), "\n";
           print $mech->uri(), "\n";
           sleep(10);
           }
           if($mech->success())
           {
                   open(FILE, ">", "TCoffee_$multizFile.html");
               print FILE $mech->content();
               close FILE;
           }

The problem remais that $mech stores just the first fetched response it gets. I don't know how to ignore it... As you see there are no liks or forms on the intermediate pages. Nethertheless thank you very much for your efforts. Maybe it is designed that way in order to avoid scripts that the server is contacted by scripts... I have tried it also with a module from CPAN that coded the same algorithm as the page offers but failed there too. Simingly there are still bugs in that module and it is only UNIX compatible...

Sebastian

Reply via email to