Hi there! I'm building a crawler that will "understand" some kind of pages. I want to be able to process a restricted group of websites.
In essence, for example: I want to search for reviews of the products of my company in some blogs I well know. I don't know if Nutch can help me here. What I'm currently doing is a crawler that fetches pages, transforms them with the template designed for the site with xslt and the parses content. The question here is: Can this be done well with Nutch or will it imply a big overhead? What plugins will needs to be developed? Thank you!

