While running generate, we noticed that cpu is 100%.

We traced the problem to "urlfilter-regex" plugin in nutch.
After changing that to our own urlfilter plugin, we are seeing execution
time comparable to the one with hadoop 0.4.

Thanks
Manish 

-----Original Message-----
From: Eric Baldeschwieler [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 10, 2006 4:19 PM
To: [email protected]
Subject: Re: job with Hadoop 0.5 is running much slower than 0.4

Was there a change in behavior between releases or are you just tuning?

On Aug 10, 2006, at 10:08 AM, Kalbande, Manish wrote:

> Hi,
>
> I have a cluster of 21 nodes + 1 name node.
>
> To perform "generate" on crawlDB of size 1 Billion urls with 700 
> million unfetched, it took more than 12 hours (most of the time was 
> taken by Map tasks), while same thing takes close to 1 hour using 
> hadoop 0.4.
>
> I have not changed any configuration, just added the additional 
> properties which was added in 0.5.
>
> Are there any (new) properties which I can tweak?
>
> Thanks
> Manish
>

Reply via email to