Re: need help to speed up map-reduce

Uroš Gruber Mon, 06 Nov 2006 22:41:29 -0800

Doug Cook wrote:

I've been planning to spend some time looking at this, but haven't gotten
round to it yet -- I see the same (serious) performance problems on a single
machine setup -- reduce takes quite a bit longer than the fetch (map)
operation in my case, and this is on a very fast 4-CPU machine with a ton of
memory. It just doesn't seem like it should take this long. I'm using 0.8 +
some patches & local mods.


If you find some things, please let me know. Likewise, when I get round to
it, I will post my findings.

I was talking about slownes months ago, so I'm glad someone else havethe same problems. We also have single machine and reduce task takeshours to complete. Funny thing is that CPU is loaded 100% but when we dosearch on this server there is no difference in speed. But still Itwould be great if things go faster.

When fetching I have 20 to 30 pages per sec. But then I have to wait forreduce task to finish. I try use debug loging and only thing I can seeis about 1 to 3 seconds between reduce log msgs. I know that map/reduceis meant to use with multiple nodes.


regards

Uros

Thanks,

Doug



AJ Chen-2 wrote:

Sorry for repeating this question. But, I have to find a solution,
otherwise
the crawling is too slow to be practical.  I'm using nutch 0.9-dev on one
linux server to crawl millions of pages.  The fetching itself is
reasonable,
but the map-reduce operations is killing the performance. For example,
fetching takes 10 hours and map-reduce also takes 10 hours, which makes
the
overall performance very slow. Can anyone share experience on how to speed
up map-reduce for single server crawling?  Single server uses local file
system. It should spend very little time in doing map and reduce, isn't it
right?

Thanks,
--
AJ Chen, PhD
http://web2express.org

Re: need help to speed up map-reduce

Reply via email to