Raghav Kapoor
Sun, 10 Aug 2008 18:11:43 -0700
Hi All:
I am working on creating a vertial search engine using Nutch.
I understand nutch from a user prespective and am able to crawl the desired
websites and serach on the indexes.I also installed the nutch 0.7.2 codebase
and able to modify code.However, I do not understand nutch enough to know how
can I get the desired content from the sites. After crawling I get too much
data and useful as well as useless links. How can I filter the content to make
it useful ?
Which classes do I need to modify ?
Thanks in advance for your help !
Regards,
Raghav