I've been planning to spend some time looking at this, but haven't gotten round to it yet -- I see the same (serious) performance problems on a single machine setup -- reduce takes quite a bit longer than the fetch (map) operation in my case, and this is on a very fast 4-CPU machine with a ton of memory. It just doesn't seem like it should take this long. I'm using 0.8 + some patches & local mods.
If you find some things, please let me know. Likewise, when I get round to it, I will post my findings. Thanks, Doug AJ Chen-2 wrote: > > Sorry for repeating this question. But, I have to find a solution, > otherwise > the crawling is too slow to be practical. I'm using nutch 0.9-dev on one > linux server to crawl millions of pages. The fetching itself is > reasonable, > but the map-reduce operations is killing the performance. For example, > fetching takes 10 hours and map-reduce also takes 10 hours, which makes > the > overall performance very slow. Can anyone share experience on how to speed > up map-reduce for single server crawling? Single server uses local file > system. It should spend very little time in doing map and reduce, isn't it > right? > > Thanks, > -- > AJ Chen, PhD > http://web2express.org > > -- View this message in context: http://www.nabble.com/need-help-to-speed-up-map-reduce-tf2585254.html#a7211011 Sent from the Nutch - Dev mailing list archive at Nabble.com.
