While running generate, we noticed that cpu is 100%. We traced the problem to "urlfilter-regex" plugin in nutch. After changing that to our own urlfilter plugin, we are seeing execution time comparable to the one with hadoop 0.4.
Thanks Manish -----Original Message----- From: Eric Baldeschwieler [mailto:[EMAIL PROTECTED] Sent: Thursday, August 10, 2006 4:19 PM To: [email protected] Subject: Re: job with Hadoop 0.5 is running much slower than 0.4 Was there a change in behavior between releases or are you just tuning? On Aug 10, 2006, at 10:08 AM, Kalbande, Manish wrote: > Hi, > > I have a cluster of 21 nodes + 1 name node. > > To perform "generate" on crawlDB of size 1 Billion urls with 700 > million unfetched, it took more than 12 hours (most of the time was > taken by Map tasks), while same thing takes close to 1 hour using > hadoop 0.4. > > I have not changed any configuration, just added the additional > properties which was added in 0.5. > > Are there any (new) properties which I can tweak? > > Thanks > Manish >
