Doug,
I definitely run several times in problems, where task-trackers was
sending hard-beat messages but hadn't process the job anymore.
For example no new pages was fetched but the page / sec. statistic
becomes slow and slower.
I personal would think it makes more sense in case the jobtracker
decide if a task is over the average processing time and need to be
reexcuted or not.
The last section of the google paper covers this issue and they
notice performance improvements by reexecutng task that are over a
specific time.
May we misunderstand each other, I do not mean tasks that crash, I
mean tasks that are 20 times slower on one machine as the other tasks
on the other machines.
Stefan
Am 10.10.2005 um 20:16 schrieb Doug Cutting:
Stefan Groschupf wrote:
Do I miss the section in the jobtracker where this is done, or
are people interested that I submit a patch doing this mechanism?
This is mostly already implemented. The tasktracker fails tasks
that do not update their status within a configurable timeout.
Task status is updated each time a task reads an input, writes an
output or calls the Reporter.setStatus() method. The jobtracker
will retry failed tasks up to four times.
The mapred-based fetcher also should not hang. It will exit even
when it has hung threads. So the task timeout should be set to the
maximum amount of time that any single page should require to fetch
& parse. By default it is set to 10 minutes.
Doug
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net