Re: Multiple tasktrackers per node

Ben Reed Thu, 25 May 2006 09:19:00 -0700

My task_zoom.patch fixes "the 10 sec delay before getting anothertask when a task completes" bug. It is a rather minor part of thetask_zoom.patch. Basically, the TaskTracker updates the JobTracker assoon as the task completes. There was another bug in the JobTrackerthat made it count all tasks rather than just the running tasks,which could cause a delay longer than 10 secs in some cases that thepatch fixes.

ben


On May 25, 2006, at 8:57 AM, Doug Cutting wrote:

Gianlorenzo Thione wrote:
Thanks for the answer. So far I am still trying to understand howeach tasktracker gets multiple map or reduce tasks to be executedsimultaneously. I have run a simple job with 53 map tasks on 5nodes, and at all times each node was executing a single task.Each cluster node is a 4 core machine, so theoretically this wasa 16-node cluster and I feel that the resources were actuallyunderutilized. Am I missing something? Is there a parameter for aminimum number of tasks to be executed in parallel (I found aparameter for setting a maximum [which I set to 4])? If I run 4TaskTrackers per node then each node gets a map task at the sametime and execution seems overall much faster.
The task tracker can currently get starved for work when taskscomplete too quickly. This is a bug that will hopefully be fixedsoon. The problem is that the task tracker only polls for a newtask once per heartbeat (10 seconds). Instead it should poll fornew tasks as soon as tasks complete. As a short-term workaroundyou can decrease the heartbeat interval to one second inMRConstants.java. With smaller clusters (< 100 machines) thatshould not cause any problems.
Doug

Re: Multiple tasktrackers per node

Reply via email to