Hi,
I tried to understand the jobtracker code.
Hmm more than 1000 lines of code in just one class. :-( This makes
understanding code very difficult.
Anyway I'm missing a mechanism to reprocess hanging tasks. May I just
didn't find the code, but I invest some time to find it.
As the google paper describe the original map reduce reprocess tasks
that may still run but are much slower than the other tasks because
of some hardware failures.
Since I notice that task-tracker isn't that stabile yet, I would
really love to have such a reprocessing mechanism.
Actually I seen tasks are reprocessed in case the task-tracker crash
and does not return any reports anymore or the task-tracker report a
task failure.
But for example in case the network speed of a fetching mapping task
is very very slow the job itself needs for ever.
I would suggest add start time and finishing time to the task object
and set these values until status changes.
We can calculate a average time a task need for processing based on
this values.
Than we have a configurable value of minimal finished tasks before we
start to reprocessing tasks. For example 80% tasks need to be ready.
Further more we have a configurable values threshold, in case the
processing time of a task is treshold * average processing time, we
just reprocessing the task on a other tasktracker.
What do people think?
Do I miss the section in the jobtracker where this is done, or are
people interested that I submit a patch doing this mechanism?
Stefan