map/reduce writing benchmark

Owen O'Malley Fri, 21 Apr 2006 08:09:51 -0700

Ok, with my last change I've run the writer benchmark twice with no(fatal) problems. But the difference is interesting... On the same 189node cluster, the first time I had 30 some "ghost" task trackers. (eg.JobTracker started, taskTracker stopped and restarted so you get twotracker _node1100_<id> with different ids. One of which never deliversa heartbeat and does not request any tasks). The ghost trackers trickedthe scheduler into thinking that the cluster wasn't very busy and so itnever scheduled more than one task on any node. The results are:


run1 (1 task/node; 189 nodes; 1890 maps writing 1 gig of dfs data):
   time: 5405 seconds
   task failures: 0


run1 (4 task/node; 189 nodes; 1890 maps writing 1 gig of dfs data):
   time: 5785 seconds
   task failures: 173

That is still a lot of failures, but at least none of them cascadedinto killing the entire job.


We need to time out TaskTrackers sooner (10 minutes? 30 minutes?)

We should probably use pending tasks rather than "current load" fordetermining when to give out new tasks.

2 cpu intel boxes are getting overloaded at 4 tasks/node, i shouldprobably back off to at least 3 (or the Hadoop default of 2).


-- Owen

map/reduce writing benchmark

Reply via email to