I run http://10.am and do this on a largish scale.

For aggregating RSS feeds I use RSSLite [1] rather than XML::RSS. RSSLite
avoids using expat and is a little naughty in parsing XML that would make
expat barf ( Alot of RSS feeds unfortunatly contain bad XML ).

For actual scaping of sites I basically use meaty regexps or HTML::Parser.

10.am also supplys feeds [2] in RSS if you want to use them.

I hope to Open Source 10.am in the near future when I sort out some
contractual obligations.

mallum

[1] http://industrial-linux.org/RSSLite/
[2] http://10.am/docs/feeds.htm (eg http://10.am/Development/Perl-rss )

on Wed, Mar 07, 2001 at 04:36:56PM +0000, Dave Hodgkinson wrote:
> 
> What's the best way to scrape a variety of news headlines from various
> sites? Sort of a moreover for the intranet...
> 
> 
> -- 
> Dave Hodgkinson,                             http://www.hodgkinson.org
> Editor-in-chief, The Highway Star           http://www.deep-purple.com
>       Apache, mod_perl, MySQL, Sybase hired gun for, well, hire
>   -----------------------------------------------------------------
> 

Reply via email to