I've been planning to spend some time looking at this, but haven't gotten
round to it yet -- I see the same (serious) performance problems on a single
machine setup -- reduce takes quite a bit longer than the fetch (map)
operation in my case, and this is on a very fast 4-CPU machine with a ton of
memory. It just doesn't seem like it should take this long. I'm using 0.8 +
some patches & local mods.

If you find some things, please let me know. Likewise, when I get round to
it, I will post my findings.

Thanks,

Doug



AJ Chen-2 wrote:
> 
> Sorry for repeating this question. But, I have to find a solution,
> otherwise
> the crawling is too slow to be practical.  I'm using nutch 0.9-dev on one
> linux server to crawl millions of pages.  The fetching itself is
> reasonable,
> but the map-reduce operations is killing the performance. For example,
> fetching takes 10 hours and map-reduce also takes 10 hours, which makes
> the
> overall performance very slow. Can anyone share experience on how to speed
> up map-reduce for single server crawling?  Single server uses local file
> system. It should spend very little time in doing map and reduce, isn't it
> right?
> 
> Thanks,
> -- 
> AJ Chen, PhD
> http://web2express.org
> 
> 

-- 
View this message in context: 
http://www.nabble.com/need-help-to-speed-up-map-reduce-tf2585254.html#a7211011
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Reply via email to