Hi, I need to build a system that crawls a given set of RSS feed urls periodically. For each RSS feed, the system needs to maintain a master RSS feed that contains all the items i.e. even though old items get dropped from the RSS feed, the master RSS feed contains all the items.
Does something similar to this already exist? I noticed a couple of mail threads pertaining to this but its not very clear if Nutch is the right framework for a task like this. I would really appreciate any pointers/comments/suggestions regarding this. Thanks, Manoj. -- Tired of reading blogs? Listen to your favorite blogs at http://www.blogbard.com !!!!
