Hi,
 
I have a cluster of 21 nodes + 1 name node.
 
To perform "generate" on crawlDB of size 1 Billion urls with 700 million
unfetched, it took more than 12 hours (most of the time was taken by Map
tasks), while same thing takes close to 1 hour using hadoop 0.4.
 
I have not changed any configuration, just added the additional
properties which was added in 0.5.
 
Are there any (new) properties which I can tweak?
 
Thanks
Manish
 

Reply via email to