[ http://issues.apache.org/jira/browse/HADOOP-277?page=all ]

Sameer Paranjpye updated HADOOP-277:
------------------------------------

    Version: 0.3.1

> Race condition in Configuration.getLocalPath()
> ----------------------------------------------
>
>          Key: HADOOP-277
>          URL: http://issues.apache.org/jira/browse/HADOOP-277
>      Project: Hadoop
>         Type: Bug

>     Versions: 0.3.1
>  Environment: linux, 64 bit, dual core, 4x400GB disk, 4GB RAM
>     Reporter: paul sutter
>     Assignee: Sameer Paranjpye
>  Attachments: hadoop-277.patch, hadoop-task_1_r_9.log, mkdir-p.patch.txt, 
> mkdirs.patch
>
> (attached: a patch to fix the problem, and a logfile showing the problem 
> occuring twice)
> There is a race condition in Configuration.java:
>        Path file = new Path(dirs[index], path);
>        Path dir = file.getParent();
>        if (fs.exists(dir) || fs.mkdirs(dir)) {
>          return file;
> If two threads simultaneously process this code with the same target 
> directory, fs.exists() will return false, but from fs.mkdirs() only one of 
> the two threads will return true. From the Java documentation:
>  "returns: true if and only if the directory was created, along with all 
> necessary parent directories; false otherwise"
> That is, if the first thread successfully creates the directory, the second 
> will not, and therefore return false, even though the directory exists.
> This was really happening. We use four temporary directories, and we had 
> reducers failing all over the place with  bizarre impossible errors. I 
> modified the ReduceTaskRunner to output the filename that it creates to find 
> the problem, and the log output is below.
> Here you can see copies initiated for two files that hash to the same temp 
> directory, simultaneously. map_4.out is created in the correct directory 
> (/data2...), but map_15.out is created in the next directory (/data3...) 
> becuase of this race condition. Minutes later, when the appender tries to 
> locate the file, that race condition does not occur (the directory already 
> exists), and the appender looks for the file map_15.out in the correct 
> directory, where it does not exist.
> 060605 142414 task_0001_r_000009_1 Copying task_0001_m_000004_0 output from 
> rmr05.
> 060605 142414 task_0001_r_000009_1 Copying task_0001_m_000015_0 output from 
> rmr04.
> ...
> 060605 142416 task_0001_r_000009_1 done copying task_0001_m_000004_0 output 
> from rmr05 into /data2/tmp/mapred/local/task_0001_r_000009_1/map_4.out
> ...
> 060605 142418 task_0001_r_000009_1 done copying task_0001_m_000015_0 output 
> from rmr04 into /data3/tmp/mapred/local/task_0001_r_000009_1/map_15.out
> ...
> 060605 142531 task_0001_r_000009_1 0.31808624% reduce > append > 
> /data2/tmp/mapred/local/task_0001_r_000009_1/map_4.out
> ...
> 060605 142725 task_0001_r_000009_1 java.io.FileNotFoundException: 
> /data2/tmp/mapred/local/task_0001_r_000009_1/map_15.out

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to