On Wed, Dec 24, 2008 at 22:09, Collaborate <tolg...@yahoo.com> wrote: > On Dec 23, 3:34 pm, nore...@gunnar.cc (Gunnar Hjalmarsson) wrote: >> Collaborate wrote: >> > I am wondering if there is a way to copy a webpage to a text file >> > using Perl. >> >> use LWP::Simple; >> my $url = 'http://www.example.com/'; >> open my $fh, '>', 'webpage.txt' or die $!; >> print $fh get $url; >> >> -- >> Gunnar Hjalmarsson >> Email:http://www.gunnar.cc/cgi-bin/contact.pl > > Thanks for your input. This is a really simple code to grab the html > code and works great. Is there also a way to grab the text shown on > the webpage rather than the html code? For example, the text on pages > that are javascript (such as Yahoo Finance stock quotes) do not show > on the html code. I would like to be able to copy the text even it is > javascript. Thanks. snip
There is currently no easy way in Perl (by itself) to get data created by webpages that use Javascript. There is the JE module* which allows execution of arbitrary Javascript code, but executing Javascript is not enough; you also need the environment the code executes in. In this case it the DOM (Document Object Model). Happily, there is a module that purports to understand the DOM: HTML::DOM**. And there is even an effort to marry the two modules to WWW::Mechanize*** (an LWP frontend that simplifies session based web browsing): WWW::Mechanize::Plugin::Javascript****, but it looks like you need an experimental version of WWW::Mechanize to use it. The alternative is to use a program that already understands HTML, DOM, Javascipt, etc. that can be controlled from Perl. I know of, but have not used, Mozilla::Mechanize*****, Win32::IE::Mechanize******, and Win32::IEAutomation*******. Unfortunately, the first two were last updated in 2005 and the last in 2006. * http://search.cpan.org/dist/JE/lib/JE.pm ** http://search.cpan.org/dist/HTML-DOM/lib/HTML/DOM.pm *** http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm **** http://search.cpan.org/dist/WWW-Mechanize-Plugin-JavaScript/lib/WWW/Mechanize/Plugin/DOM.pm ***** http://search.cpan.org/~slanning/Mozilla-Mechanize/lib/Mozilla/Mechanize.pm ****** http://search.cpan.org/dist/Win32-IE-Mechanize/lib/Win32/IE/Mechanize.pm ******* http://search.cpan.org/dist/Win32-IEAutomation/lib/Win32/IEAutomation.pm -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/