I'm using hadoop (v0.4.0-patched) with nutch (v0.8.1). I have 2 servers, and some odd behavior in task allocation, so somes times one server have all the work and the other do nothing.
When I put into hadoop-site.xml: mapred.map.tasks=2 mapred.reduce.tasks=2 some times I have 1 task on each server (ok), and some times 2 tasks on the same server (lack of speed!) So I tried to put more tasks: mapred.map.tasks=4 mapred.reduce.tasks=4 and now all the time I have 2 tasks running on both servers (great) ... but some times the tasks on one server are really smaller than the other (really less urls to fetch) and so it takes more time to achieve the work. I assume there is a load-balance process if the servers haven't the same power, but here the 2 servers are exactly the same! (linux ubuntu, xeon, 2GB, 100Mb) Can someone explain to me how the hadoop process who allocate the tasks over the nodes works? And can I tweak/modify/configure it? -- View this message in context: http://www.nabble.com/How-hadoop-allocate-the-tasks-overs-the-nodes---tf3805319.html#a10768420 Sent from the Hadoop Users mailing list archive at Nabble.com.
