[ http://issues.apache.org/jira/browse/HADOOP-133?page=comments#action_12374428 ]
Doug Cutting commented on HADOOP-133: ------------------------------------- Sure, that would be safer, but recall that this communication is all on the same host. A tasktracker shouldn't have more than a handful of children, so per second pings should not be a great burden. And communications problems to localhost seem unlikely. I've seen nodes with loads over 100, timing out all sorts of requests from other hosts, and I've never seen "Parent died" logged when a tasktracker was really still alive. But, still, it shouldn't hurt to try a few times. > the TaskTracker.Child.ping thread calls exit > -------------------------------------------- > > Key: HADOOP-133 > URL: http://issues.apache.org/jira/browse/HADOOP-133 > Project: Hadoop > Type: Bug > Components: mapred > Versions: 0.1.1 > Reporter: Owen O'Malley > Assignee: Owen O'Malley > > The TaskTracker.Child.startPinging thread calls exit if the TaskTracker > doesn't respond. Calling exit in a mutli-threaded program is really > problematic. In particular, it prevents cleanup/finally clauses from running. > We need to move to a model where it uses Thread.interrupt(), which means we > need to check the interrupt flag in place in the map loop and reduce loop and > stop masking the InterruptExceptions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
