Sorry for repeating this question. But, I have to find a solution, otherwise
the crawling is too slow to be practical.  I'm using nutch 0.9-dev on one
linux server to crawl millions of pages.  The fetching itself is reasonable,
but the map-reduce operations is killing the performance. For example,
fetching takes 10 hours and map-reduce also takes 10 hours, which makes the
overall performance very slow. Can anyone share experience on how to speed
up map-reduce for single server crawling?  Single server uses local file
system. It should spend very little time in doing map and reduce, isn't it
right?

Thanks,
--
AJ Chen, PhD
http://web2express.org

Reply via email to