Doug,
I definitely run several times in problems, where task-trackers was sending hard-beat messages but hadn't process the job anymore. For example no new pages was fetched but the page / sec. statistic becomes slow and slower. I personal would think it makes more sense in case the jobtracker decide if a task is over the average processing time and need to be reexcuted or not. The last section of the google paper covers this issue and they notice performance improvements by reexecutng task that are over a specific time.

May we misunderstand each other, I do not mean tasks that crash, I mean tasks that are 20 times slower on one machine as the other tasks on the other machines.

Stefan


Am 10.10.2005 um 20:16 schrieb Doug Cutting:

Stefan Groschupf wrote:

Do I miss the section in the jobtracker where this is done, or are people interested that I submit a patch doing this mechanism?


This is mostly already implemented. The tasktracker fails tasks that do not update their status within a configurable timeout. Task status is updated each time a task reads an input, writes an output or calls the Reporter.setStatus() method. The jobtracker will retry failed tasks up to four times.

The mapred-based fetcher also should not hang. It will exit even when it has hung threads. So the task timeout should be set to the maximum amount of time that any single page should require to fetch & parse. By default it is set to 10 minutes.

Doug



---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to