Hi all,

In an ideal world, my TaskTrackers would be working for me all the time.
That is: the average number of tasks a TaskTracker is handling/processing would be close to 'mapred.tasktracker.tasks.maximum' for any given time period.

But... it might not always be possible to feed the slaves with enough tasks. Sometimes jobs have to be finished before another can start, and if some slaves finish their tasks faster than others (faster machines, smaller tasks) they will have to wait for others to complete theirs.

Is there a way to easily determine the efficiency of my cluster?
Example:
- there are 5 slaves which can handle 1 task at the time each
- there is one job, split into 5 sub tasks (5 maps and 5 reduces)
- 4 slaves finish their tasks in 1 minute
- 1 slave finishes its tasks in 2 minutes (so 4 slaves are waiting 1 minute)

... then one could say that the cluster usage is 60% (6 working minutes, 4 waiting minutes)

Mathijs

--
Knowlogy
Helperpark 290 C
9723 ZA Groningen

[EMAIL PROTECTED]
+31 (0)6 15312977
http://www.knowlogy.nl


Reply via email to