Dear web perlers,

I encountered a web page that consistently hangs, no matter what browsing
agent is used, IE, Firefox and (this is my problem) LWP::UserAgent ("ua").

This page is at
http://www.tase.co.il/TASEEng/MarketData/Indices/Additional/IndexHistoryData
.htm?Action=1&addTab=&IndexId=166, when you click on the "Intra-Day
Transaction Data" link and then on the "Display data" button.

The problem is that when I am using ua, it hangs too, forever, and never
times-out! A ^C is then required to abort it.

Here is my environment:
        * OS: x64 Windows 7 Pro on an Intel x64 processor
        * Perl: ActiveState Perl v5.14.2 built for MSWin32-x64-multi-thread
(with 1 registered patch, see perl -V for more detail)
        * WWW::Mechanize: version 1.72
        * LWP (including LWP::UserAgent and LWP::Protocol::http) version
6.02

I am including a demonstrating program that can show both a good page and
the bad one above. If though it turns out to be too long to be accepted by
the list server, I will send it to anybody on request.

This program uses WWW::Mechanize but I traced the hanging point to the
LWP::UserAgent method "$protocol->request" at LINE 193, called from
WWW::Mechanize 'submit_form' method. There the $protocol object is of the
'LWP::Protocol::http' class. At this point I got lost. I am too much of a
newbie to understand what is going on there.

Can anybody show me what to do to further trace the problem?

Regards,
Meir

============================
#!/usr/bin/perl
# Copyright Juan Pedro Paredes Caballero <jua...@iquis.com>

use WWW::Mechanize;
use HTTP::Cookies;
use LWP::ConnCache;

# This URL points to a HANGING page. A request to it always hangs, and
causes UserAgent to hang too:
my $urlbase =
"http://www.tase.co.il/TASEEng/MarketData/Indices/Additional/IndexHistoryDat
a.htm?Action=2&IndexId=166&subDataType=0";

# For comparison, this URL directs to a good page that does respond well:
#my $urlbase =
"http://www.tase.co.il/TASEEng/MarketData/Indices/MarketCap/IndexHistoryData
.htm?Action=1&addTab=&IndexId=142";

# And this is the the second URL required to complete the query:
my $urltsv  =
"http://www.tase.co.il/TASE/Pages/Export.aspx?tbl=0&Columns=AddColColumnsHis
tory&Titles=AddColTitlesHistory&sn=dsHistory&enumTblType=GridHistoryinner&Ex
portType=4";

#Session cache
my $conn_cache = LWP::ConnCache->new;
#Cookies (Cookie store) This is the key for a query success, we must obtain
a query cookie, store it in cookie jar and keep it across requests
my $cookie_jar = HTTP::Cookies->new;
#Create Mechanize session with our session cache and cookie jar
my $mech = WWW::Mechanize->new(conn_cache =>
$conn_cache,cookie_jar=>$cookie_jar);

#Some headers to emulate a Firefox Browser.
$mech->add_header('User-Agent','Mozilla/5.0 (Windows; U; Windows NT 5.1;
es-ES; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
$mech->add_header('Accept','text/html,application/xhtml+xml,application/xml;
q=0.9,*/*;q=0.8');
$mech->add_header('Accept-Language','en-us;q=0.8,en;q=0.3');
$mech->add_header('Accept-Encoding','gzip,deflate');
$mech->add_header('Accept-Charset','utf-8;q=0.7,*;q=0.7');

# we get the content of the URL base
#print "URL:$urlbase\n";
$mech->get($urlbase);
$out = $mech->content();

# To obtain the final TSV one must bypass/emulate some changes usually done
by javascript
# The form action is not correct, the submit button is not a submit one, and
the hiddenID
# unlocks the request and indicates the kind of request to the server. We
modify the form
# action to bypass a javascript submit:
my($base) = $out =~ /base href="(.*?)"/;
$out =~ s/action=".*?" /action="$base" /;

# We modify the button to bypass the javascript submit
$out=~s/<input  type="button"  value="Display Data"  Class="RegularButton"
Width="70"  onclick="frmsubmit\('1'\)" >/<input  type="submit"
name="Display Data"  value="Display Data"  Class="RegularButton" Width="70"
onclick="frmsubmit('1')" >/;
# We update the html page with our non javascript submit:
$mech->update_html($out);

# We then submit the form-activating hiddenID lock (another javascript
bypass)

# IT IS IN THIS WWW::Mechanize 'submit_form' METHOD THAT THE LWP::UserAgent
METHOD "$protocol->request" (LINE 193) HANGS ON A HUNG 'TASE' SERVER:
$mech->submit_form(
        form_name => "Form1",
        fields => {
                'HistoryData1$hiddenID' => "1"
                  },
        button => "Display Data"
);

# And this is the last stage of the download request:
$mech->get($urltsv);
#print "URL:$urltsv\n";

$out = $mech->content();
#We format $out to remove extra line feeds characters
$out=~s/[\r\n]+/\n/g;

my $response = $mech->response;
my $filename = $response->filename;
$filename="TSV2_$filename";
open (TSV, ">:encoding(utf8)", $filename);
print TSV $out;
close (TSV);

print "TSV saved to $filename\n";


Reply via email to