I've been experiencing some issues where my mapred tasks have been
hanging after a lengthy period of execution.  I believe I've found the
problem and wanted to get other's thoughts about it.

The problem seems to be with the MapTask's (MapTask.java) sort
progress thread (line #196) not stopping after the sort is completed,
and hence the call to join() (line# 190) never returns.  This is
because that thread is only catching the InterruptedException, and not
checking the thread's interrupted flag as well.  According to the
Javadocs, an InterruptedException is thrown only if the Thread is in
the middle of the sleep(), wait(), join(), etc. calls, and during
normal operations only the interrupted flag is set.  Can someone
confirm this?  I'm going to patch my install to see if this is my
problem, but I seem to only run into this problem after several hours
of processing and would like to get earlier confirmation.

I did a search in JIRA and it looks like there are patches
(HADOOP-1431) that might inadvertently solve this problem, but didn't
see any one ticket that specifically details this scenario.

Calvin

Reply via email to