Hi,
I am researching what would be the best way to get a robot (crawler) that would search internet news sites with news that would be relevant for my website. This would be anything about Security/Hacking/Privacy. I have a list of sites I'd like to crawl as well as keywords that could be used. I'd also like to be able to parse out the data from each article on my end such as title, author, text, date of publication, URL, etc. One issue that I am aware of is that not every site likes people using bots on them. Are there ways around this? Is anyone aware if any tech news sites offer XML\file format export of their articles? For large sites where I get lots of articles, this would work well. My goal is to instead of manually going out to sites to find articles about security I want the articles and relevant info to come to me. Any help, or a point in the right direction would be much appreciated. Christopher Braswell ******************************************************************************* Note: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Ernst & Young LLP ******************************************************************************* -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".