Hi Jeswin, On Wed, 26 Oct 2011 09:04:32 -0400 Jeswin <phillyj...@gmail.com> wrote:
> Hi all, > I'm still a beginner but I have a project I want to work on. > > I want to pull price data from a website and would like your advice on > getting started. > > This is my idea and a basic implementation of the process: > > 1) The input is coverted to the web link, i.e., if I type in "force of will" > the output is > http://sales.starcitygames.com//search.php?substring=Force+of+Will&auto=Y > 2) Somehow, I ask perl to go to the link and get the prices and take an > average or display individual prices. > > I see that using the filter (and a longer, more complex web link) I can get > the web output displayed as a simple chart [1]. > Looking at the html source, the price data is displayed as "<td class= > "deckdbbody2">$1.99 </td>" . So maybe I can get a regexp to get all the > different prices and list them. > > What should I be looking at to learn more on doing this? Is there a better > way? First of all, you should be using WWW::Mechanize or something similar to perform the web-automation. Then you should use XML::LibXML's HTML parsing mode or HTML::TreeBuilder or similar to retrieve the data from the HTML. Do *not* parse HTML using regular expressions: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html Oct 13 16:53:51 <rindolf> perlbot: html Oct 13 16:53:51 <perlbot> rindolf: Don't parse or modify html with regular expressions! See one of HTML::Parser's subclasses: HTML::TokeParser, HTML::TokeParser::Simple, HTML::TreeBuilder(::Xpath)?, HTML::TableExtract etc. If your response begins "that's overkill. i only want to..." you are wrong. http://en.wikipedia.org/wiki/Chomsky_hierarchy and http://xrl.us/bf4jh6 for why not to use regex on HTML Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ "The Human Hacking Field Guide" - http://shlom.in/hhfg Larry Wall *does* know all of Perl. However, he pretends to be wrong or misinformed, so people will underestimate him. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/