Hi Jeswin,

On Wed, 26 Oct 2011 09:04:32 -0400
Jeswin <phillyj...@gmail.com> wrote:

> Hi all,
> I'm still a beginner but I have a project I want to work on.
> 
> I want to pull price data from a website and would like your advice on
> getting started.
> 
> This is my idea and a basic implementation of the process:
> 
> 1) The input is coverted to the web link, i.e., if I type in "force of will"
> the output is
> http://sales.starcitygames.com//search.php?substring=Force+of+Will&auto=Y
> 2) Somehow, I ask perl to go to the link and get the prices and take an
> average or display individual prices.
> 
> I see that using the filter (and a longer, more complex web link) I can get
> the web output displayed as a simple chart [1].
> Looking at the html source, the price data is displayed as "<td class=
> "deckdbbody2">$1.99&nbsp;</td>" . So maybe I can get a regexp to get all the
> different prices and list them.
> 
> What should I be looking at to learn more on doing this? Is there a better
> way?

First of all, you should be using WWW::Mechanize or something similar to
perform the web-automation. Then you should use XML::LibXML's HTML parsing
mode or HTML::TreeBuilder or similar to retrieve the data from the HTML. Do
*not* parse HTML using regular expressions:

http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

Oct 13 16:53:51 <rindolf>       perlbot: html
Oct 13 16:53:51 <perlbot>       rindolf: Don't parse or modify html with 
regular expressions! See one of HTML::Parser's subclasses: HTML::TokeParser, 
HTML::TokeParser::Simple, HTML::TreeBuilder(::Xpath)?, HTML::TableExtract etc. 
If your response begins "that's overkill. i only want to..." you are wrong. 
http://en.wikipedia.org/wiki/Chomsky_hierarchy and http://xrl.us/bf4jh6 for why 
not to use regex on HTML

Regards,

        Shlomi Fish


-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
"The Human Hacking Field Guide" - http://shlom.in/hhfg

Larry Wall *does* know all of Perl. However, he pretends to be wrong
or misinformed, so people will underestimate him.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to