[ 
https://issues.apache.org/jira/browse/HADOOP-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HADOOP-3546:
--------------------------------

    Status: Open  (was: Patch Available)

There is a race condition in the cleanup thread due to which the thread may 
never exit (a case where the interrupt is sent by the main thread but the 
cleanup thread is just about to do tasksToCleanup.take(); hence the interrupt 
is lost, and the cleanup thread will stay in take() for ever). Although, we 
could handle the problem by introducing additional synchronization, I'd suggest 
that we remove the join for the threads and instead make the threads run as 
daemons. I am nervous about putting lot of code for synchronization to handle 
the case where files are left over on tasktracker exit. Since, the tasktracker, 
at startup, does the cleanup anyway, we should be ok. 

> TaskTracker re-initialization gets stuck in cleaning up
> -------------------------------------------------------
>
>                 Key: HADOOP-3546
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3546
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: patch-3546.txt
>
>
> If TaskTracker gets reinit action, it is stuck in joining task cleanup 
> thread. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to