Hi,
I tried to understand the jobtracker code.
Hmm more than 1000 lines of code in just one class. :-( This makes understanding code very difficult.

Anyway I'm missing a mechanism to reprocess hanging tasks. May I just didn't find the code, but I invest some time to find it. As the google paper describe the original map reduce reprocess tasks that may still run but are much slower than the other tasks because of some hardware failures. Since I notice that task-tracker isn't that stabile yet, I would really love to have such a reprocessing mechanism. Actually I seen tasks are reprocessed in case the task-tracker crash and does not return any reports anymore or the task-tracker report a task failure. But for example in case the network speed of a fetching mapping task is very very slow the job itself needs for ever.

I would suggest add start time and finishing time to the task object and set these values until status changes. We can calculate a average time a task need for processing based on this values. Than we have a configurable value of minimal finished tasks before we start to reprocessing tasks. For example 80% tasks need to be ready. Further more we have a configurable values threshold, in case the processing time of a task is treshold * average processing time, we just reprocessing the task on a other tasktracker.

What do people think?
Do I miss the section in the jobtracker where this is done, or are people interested that I submit a patch doing this mechanism?

Stefan

Reply via email to