Hi All:

I am working on creating a vertial search engine using Nutch.
I understand nutch from a user prespective and am able to crawl the desired 
websites and serach on the indexes.I also installed the nutch 0.7.2 codebase 
and able to modify code.However, I do not understand nutch enough to know how 
can I get the desired content from the sites. After crawling I get too much 
data and useful as well as useless links. How can I filter the content to make 
it useful ?
Which classes do I need to modify ?

Thanks in advance for your help !

Regards,
Raghav 



      

Reply via email to