I'm using hadoop (v0.4.0-patched) with nutch (v0.8.1).

I have 2 servers, and some odd behavior in task allocation, so somes times
one server have all the work and the other do nothing.


When I put into hadoop-site.xml:
mapred.map.tasks=2
mapred.reduce.tasks=2

some times I have 1 task on each server (ok), and some times 2 tasks on the
same server (lack of speed!)

So I tried to put more tasks:
mapred.map.tasks=4
mapred.reduce.tasks=4

and now all the time I have 2 tasks running on both servers (great)
... but some times the tasks on one server are really smaller than the other
(really less urls to fetch)
and so it takes more time to achieve the work.

I assume there is a load-balance process if the servers haven't the same
power, but here the 2 servers are exactly the same! (linux ubuntu, xeon,
2GB, 100Mb)


Can someone explain to me how the hadoop process who allocate the tasks over
the nodes works?
And can I tweak/modify/configure it?



-- 
View this message in context: 
http://www.nabble.com/How-hadoop-allocate-the-tasks-overs-the-nodes---tf3805319.html#a10768420
Sent from the Hadoop Users mailing list archive at Nabble.com.

Reply via email to