Race condition while launching task cleanup attempt. ----------------------------------------------------
Key: MAPREDUCE-1475 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1475 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.1 Reporter: Amareshwari Sriramadasu We found a race condition while launching task cleanup attempt on a TaskTracker which would eat up a slot. The scenario is the following: The main attempt is killed by TaskTracker because it was a speculative attempt. Cleanup attempt is launched on the same tracker. Cleanup attempt occupied the slot and is about to start. But, there was a pending RPC: done() from earlier attempt in the RPC queue. Before the cleanup attempt could be launched, TaskTracker processed the rpc from earlier attempt and made the state of the cleanup attempt as KILLED. Launcher did not launch it because it was already KILLED. But, the rpc done() failed with NullPointerException because of false state. In summary, the slot was occupied by the cleanup attempt which could not be launched. And the slot was never released. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.