Hi All: I am working on creating a vertial search engine using Nutch. I understand nutch from a user prespective and am able to crawl the desired websites and serach on the indexes.I also installed the nutch 0.7.2 codebase and able to modify code.However, I do not understand nutch enough to know how can I get the desired content from the sites. After crawling I get too much data and useful as well as useless links. How can I filter the content to make it useful ? Which classes do I need to modify ?
Thanks in advance for your help ! Regards, Raghav