Dear web perlers, I encountered a web page that consistently hangs, no matter what browsing agent is used, IE, Firefox and (this is my problem) LWP::UserAgent ("ua").
This page is at http://www.tase.co.il/TASEEng/MarketData/Indices/Additional/IndexHistoryData .htm?Action=1&addTab=&IndexId=166, when you click on the "Intra-Day Transaction Data" link and then on the "Display data" button. The problem is that when I am using ua, it hangs too, forever, and never times-out! A ^C is then required to abort it. Here is my environment: * OS: x64 Windows 7 Pro on an Intel x64 processor * Perl: ActiveState Perl v5.14.2 built for MSWin32-x64-multi-thread (with 1 registered patch, see perl -V for more detail) * WWW::Mechanize: version 1.72 * LWP (including LWP::UserAgent and LWP::Protocol::http) version 6.02 I am including a demonstrating program that can show both a good page and the bad one above. If though it turns out to be too long to be accepted by the list server, I will send it to anybody on request. This program uses WWW::Mechanize but I traced the hanging point to the LWP::UserAgent method "$protocol->request" at LINE 193, called from WWW::Mechanize 'submit_form' method. There the $protocol object is of the 'LWP::Protocol::http' class. At this point I got lost. I am too much of a newbie to understand what is going on there. Can anybody show me what to do to further trace the problem? Regards, Meir ============================ #!/usr/bin/perl # Copyright Juan Pedro Paredes Caballero <jua...@iquis.com> use WWW::Mechanize; use HTTP::Cookies; use LWP::ConnCache; # This URL points to a HANGING page. A request to it always hangs, and causes UserAgent to hang too: my $urlbase = "http://www.tase.co.il/TASEEng/MarketData/Indices/Additional/IndexHistoryDat a.htm?Action=2&IndexId=166&subDataType=0"; # For comparison, this URL directs to a good page that does respond well: #my $urlbase = "http://www.tase.co.il/TASEEng/MarketData/Indices/MarketCap/IndexHistoryData .htm?Action=1&addTab=&IndexId=142"; # And this is the the second URL required to complete the query: my $urltsv = "http://www.tase.co.il/TASE/Pages/Export.aspx?tbl=0&Columns=AddColColumnsHis tory&Titles=AddColTitlesHistory&sn=dsHistory&enumTblType=GridHistoryinner&Ex portType=4"; #Session cache my $conn_cache = LWP::ConnCache->new; #Cookies (Cookie store) This is the key for a query success, we must obtain a query cookie, store it in cookie jar and keep it across requests my $cookie_jar = HTTP::Cookies->new; #Create Mechanize session with our session cache and cookie jar my $mech = WWW::Mechanize->new(conn_cache => $conn_cache,cookie_jar=>$cookie_jar); #Some headers to emulate a Firefox Browser. $mech->add_header('User-Agent','Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12'); $mech->add_header('Accept','text/html,application/xhtml+xml,application/xml; q=0.9,*/*;q=0.8'); $mech->add_header('Accept-Language','en-us;q=0.8,en;q=0.3'); $mech->add_header('Accept-Encoding','gzip,deflate'); $mech->add_header('Accept-Charset','utf-8;q=0.7,*;q=0.7'); # we get the content of the URL base #print "URL:$urlbase\n"; $mech->get($urlbase); $out = $mech->content(); # To obtain the final TSV one must bypass/emulate some changes usually done by javascript # The form action is not correct, the submit button is not a submit one, and the hiddenID # unlocks the request and indicates the kind of request to the server. We modify the form # action to bypass a javascript submit: my($base) = $out =~ /base href="(.*?)"/; $out =~ s/action=".*?" /action="$base" /; # We modify the button to bypass the javascript submit $out=~s/<input type="button" value="Display Data" Class="RegularButton" Width="70" onclick="frmsubmit\('1'\)" >/<input type="submit" name="Display Data" value="Display Data" Class="RegularButton" Width="70" onclick="frmsubmit('1')" >/; # We update the html page with our non javascript submit: $mech->update_html($out); # We then submit the form-activating hiddenID lock (another javascript bypass) # IT IS IN THIS WWW::Mechanize 'submit_form' METHOD THAT THE LWP::UserAgent METHOD "$protocol->request" (LINE 193) HANGS ON A HUNG 'TASE' SERVER: $mech->submit_form( form_name => "Form1", fields => { 'HistoryData1$hiddenID' => "1" }, button => "Display Data" ); # And this is the last stage of the download request: $mech->get($urltsv); #print "URL:$urltsv\n"; $out = $mech->content(); #We format $out to remove extra line feeds characters $out=~s/[\r\n]+/\n/g; my $response = $mech->response; my $filename = $response->filename; $filename="TSV2_$filename"; open (TSV, ">:encoding(utf8)", $filename); print TSV $out; close (TSV); print "TSV saved to $filename\n";