bruce wrote at 6/29/2006 2:27 PM:

>i'm in the middle of creating/modiying a web crawler/spider app... however,
>i'm looking to be able to have a generalizable function, where i can
>actually specify 'plugin' functionality to determine if i should extract
>information for a given page....
>  
>
Have you considered using a event-driven parser like HTML::Parser? You 
could set up event handlers for various conditions (start of tag, text, 
end of tag, etc.) and then use those handlers to run other functions 
based on the tag or other condition encountered.

One advantage of this approach is that there is no need to load the 
entire document structure into memory, which could prove helpful when 
crawling large web sites with many pages of undetermined size.

Another approach is the FEAR::API module's mini-language. Perl.com 
recently published an article about it: 
http://www.perl.com/pub/a/2006/06/01/fear-api.html

Developing "plugins" for your application as Perl modules would require 
a little more learning. I'm not sure how familiar you are with writing 
modules in Perl; if the answer is "not at all", take a look at some of 
the included documentation (perlmod, perlmodlib, perlboot, etc.) or 
maybe a book like O'Reilly's "Intermediate Perl".

Hope this helps.

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to