How to limit the pages number processed from each domain? And how to setup nutch to crawl only domains added by me (i.e. make nutch to ignore external links)? If nutch doesn't allow it then what algorithm will be the best for it?
p.s. nutch ver.0.7
How to limit the pages number processed from each domain? And how to setup nutch to crawl only domains added by me (i.e. make nutch to ignore external links)? If nutch doesn't allow it then what algorithm will be the best for it?
p.s. nutch ver.0.7