bruce wrote at 6/29/2006 2:27 PM: >i'm in the middle of creating/modiying a web crawler/spider app... however, >i'm looking to be able to have a generalizable function, where i can >actually specify 'plugin' functionality to determine if i should extract >information for a given page.... > > Have you considered using a event-driven parser like HTML::Parser? You could set up event handlers for various conditions (start of tag, text, end of tag, etc.) and then use those handlers to run other functions based on the tag or other condition encountered.
One advantage of this approach is that there is no need to load the entire document structure into memory, which could prove helpful when crawling large web sites with many pages of undetermined size. Another approach is the FEAR::API module's mini-language. Perl.com recently published an article about it: http://www.perl.com/pub/a/2006/06/01/fear-api.html Developing "plugins" for your application as Perl modules would require a little more learning. I'm not sure how familiar you are with writing modules in Perl; if the answer is "not at all", take a look at some of the included documentation (perlmod, perlmodlib, perlboot, etc.) or maybe a book like O'Reilly's "Intermediate Perl". Hope this helps. _______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
