I've been experiencing some issues where my mapred tasks have been hanging after a lengthy period of execution. I believe I've found the problem and wanted to get other's thoughts about it.
The problem seems to be with the MapTask's (MapTask.java) sort progress thread (line #196) not stopping after the sort is completed, and hence the call to join() (line# 190) never returns. This is because that thread is only catching the InterruptedException, and not checking the thread's interrupted flag as well. According to the Javadocs, an InterruptedException is thrown only if the Thread is in the middle of the sleep(), wait(), join(), etc. calls, and during normal operations only the interrupted flag is set. Can someone confirm this? I'm going to patch my install to see if this is my problem, but I seem to only run into this problem after several hours of processing and would like to get earlier confirmation. I did a search in JIRA and it looks like there are patches (HADOOP-1431) that might inadvertently solve this problem, but didn't see any one ticket that specifically details this scenario. Calvin
