Failure of a tasktracker is another failure mode. If a tasktracker fails by crashing, or running very slowly, it will stop sending heartbeats to the jobtracker (or send them very infrequently). The jobtracker will notice a tasktracker that has stopped sending heart-beats (if it hasn’t received one for 10 minutes, configured via the mapred.task tracker.expiry.interval property, in milliseconds) and remove it from its pool of tasktrackers to schedule tasks on. The jobtracker arranges for map tasks that were run and completed successfully on that tasktracker to be rerun if they belong to incomplete jobs, since their intermediate output residing on the failed tasktracker’s local filesystem may not be accessible to the reduce task. Any tasks in progress are also rescheduled. A tasktracker can also be blacklisted by the jobtracker, even if the tasktracker has not failed. A tasktracker is blacklisted if the number of tasks that have failed on it is significantly higher than the average task failure rate on the cluster. Blacklisted task-trackers can be restarted to remove them from the jobtracker’s blacklist.
2012/5/9 John Stein <designersm...@yahoo.com> > hi, > > I saw a metric called blacklisted tasktrackers in the oceansync monitoring > system, which is usually 0. What does it mean if I saw it go up? could > you explain BlackListed TaskTrackers? > > John Stein > Processing Engineer > XTO Energy > -- Regards Junyong