Race condition exists in the method MapOutputLocation.getFile
-------------------------------------------------------------

                 Key: HADOOP-723
                 URL: http://issues.apache.org/jira/browse/HADOOP-723
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Devaraj Das


There seems to be a race condition in the way the Reduces copy the map output 
files from the Maps. If a copier is blocked in the connect method (in the 
beginning of the method MapOutputLocation.getFile) to a Jetty on a Map, and the 
MapCopyLeaseChecker detects that the copier was idle for too long, it will go 
ahead and issue a interrupt (read 'kill') to this thread and create a new 
Copier thread. However, the copier, currently blocked trying to connect to 
Jetty on a Map, doesn't actually get killed until the connect timeout expires 
and as soon as the connect comes out (with an IOException), it will delete the 
map output file which actually could have been (successfully) created by the 
new Copier thread. This leads to the Sort phase for that reducer failing with a 
FileNotFoundException.
One simple way to fix this is to not delete the file if the file was not 
created within this getFile method.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to