On Wed, Dec 24, 2008 at 22:09, Collaborate <tolg...@yahoo.com> wrote:
> On Dec 23, 3:34 pm, nore...@gunnar.cc (Gunnar Hjalmarsson) wrote:
>> Collaborate wrote:
>> > I am wondering if there is a way to copy a webpage to a text file
>> > using Perl.
>>
>>      use LWP::Simple;
>>      my $url = 'http://www.example.com/';
>>      open my $fh, '>', 'webpage.txt' or die $!;
>>      print $fh get $url;
>>
>> --
>> Gunnar Hjalmarsson
>> Email:http://www.gunnar.cc/cgi-bin/contact.pl
>
> Thanks for your input. This is a really simple code to grab the html
> code and works great. Is there also a way to grab the text shown on
> the webpage rather than the html code? For example, the text on pages
> that are javascript (such as Yahoo Finance stock quotes) do not show
> on the html code. I would like to be able to copy the text even it is
> javascript. Thanks.
snip

There is currently no easy way in Perl (by itself) to get data created
by webpages that use Javascript.  There is the JE module* which allows
execution of arbitrary Javascript code, but executing Javascript is
not enough; you also need the environment the code executes in.  In
this case it the DOM (Document Object Model).  Happily, there is a
module that purports to understand the DOM: HTML::DOM**.  And there is
even an effort to marry the two modules to WWW::Mechanize*** (an LWP
frontend that simplifies session based web browsing):
WWW::Mechanize::Plugin::Javascript****, but it looks like you need an
experimental version of WWW::Mechanize to use it.

The alternative is to use a program that already understands HTML,
DOM, Javascipt, etc. that can be controlled from Perl.  I know of, but
have not used, Mozilla::Mechanize*****, Win32::IE::Mechanize******,
and Win32::IEAutomation*******.  Unfortunately, the first two were
last updated in 2005 and the last in 2006.

* http://search.cpan.org/dist/JE/lib/JE.pm
** http://search.cpan.org/dist/HTML-DOM/lib/HTML/DOM.pm
*** http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize.pm
**** 
http://search.cpan.org/dist/WWW-Mechanize-Plugin-JavaScript/lib/WWW/Mechanize/Plugin/DOM.pm
***** 
http://search.cpan.org/~slanning/Mozilla-Mechanize/lib/Mozilla/Mechanize.pm
****** http://search.cpan.org/dist/Win32-IE-Mechanize/lib/Win32/IE/Mechanize.pm
******* http://search.cpan.org/dist/Win32-IEAutomation/lib/Win32/IEAutomation.pm

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to