RE: XML::RSS scraping of web pages

Gary Nielson Sun, 24 Oct 2004 08:59:58 -0700

I'm sorry. I wasn't precise enough. I meant that this corporation didn't
offer news feeds in RSS format of their own news articles from their
websites. They said their technology department had not yet implemented it
in their set of Java tools and that it might be a while, given that they had
other priorities. I suggested that XML::RSS -- run by them to scrape their
content right off their web pages to convert to RSS -- could get the job
done right now. I didn't see any downsides, but they said their techs
"cringe" every time they hear the word "scraping." I don't understand why.
What are the technical downsides to using Perl's RSS tools?


Gary 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Terris
Linenbach
Sent: Saturday, October 23, 2004 8:35 PM
To: Gary Nielson
Cc: [EMAIL PROTECTED]
Subject: Re: XML::RSS scraping of web pages

RSS feeds that are based on scraping are likely illegal from a copyright
perspective.  The problem with RSS is that it allows for actual content that
can be read for the most part without visiting the originating site and
viewing advertisements that are the basis for a revenue model.  This
violates "free use" -- you can quote an article but not the whole thing
verbatim.

I think the portals will eventually provide categorized RSS feeds that only
contain a few lines of text and a hyperlink to the site.  The webmasters
that get the traffic will eventually appreciate the business.

Sort of like google news but for wider topics than just your everyday news.

On Sat, 23 Oct 2004 20:16:05 -0400, Gary Nielson <[EMAIL PROTECTED]>
wrote:
> I have a technical/philosophical question about scraping of web pages 
> for RSS feeds. You all helped me a while back in figuring out the use 
> of HTML::TokeParser and XML::RSS to extract headlines and links from web
pages.
> As far as I am concerned, scraping of a web page is the same as 
> accessing a web page through a web browser... unless I am missing 
> something... But I was talking to techies at a big corporation who 
> manage web sites and they were telling me that they cringe whenever 
> the word "scraping" is used, that it is like chalk on a blackboard to 
> them... I didn't have the chance to pursue this with them, but it 
> surprised me. So what is wrong with "scraping," or what are the 
> disadvantages of it over other technologies? Aren't many of the RSS feeds
out there as a result of scraping? Any insights appreciated.
> 
> _______________________________________________
> ActivePerl mailing list
> [EMAIL PROTECTED]
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: XML::RSS scraping of web pages

Reply via email to