[
http://issues.apache.org/jira/browse/HADOOP-277?page=comments#action_12414999 ]
Naveen Nalam commented on HADOOP-277:
-------------------------------------
Below is what mkdirs looks like according to the jad decompiler (File.class
from is from Sun JDK 1.5).
It looks like to me if two processes/threads are trying to create "/a/b/c/d/e/"
and nothing yet exists, they both try to create "/a". One will fail, while the
other succeeds. The failing process will return failure early, while the other
process continues to create "b/c/d/e/". If the failing process after returning
from mkdirs() now calls exists("/a/b/c/d/e"), exists() could return false
because the other process is still creating the directories along the path.
So probably Sameer's suggestion of traversing in getLocalPath() is the best
solution.
public boolean mkdirs()
{
if(exists())
return false;
if(mkdir())
return true;
File file = null;
try
{
file = getCanonicalFile();
}
catch(IOException ioexception)
{
return false;
}
String s = file.getParent();
return s != null && (new File(s, fs.prefixLength(s))).mkdirs() &&
file.mkdir();
}
> Race condition in Configuration.getLocalPath()
> ----------------------------------------------
>
> Key: HADOOP-277
> URL: http://issues.apache.org/jira/browse/HADOOP-277
> Project: Hadoop
> Type: Bug
> Environment: linux, 64 bit, dual core, 4x400GB disk, 4GB RAM
> Reporter: paul sutter
> Attachments: hadoop-277.patch, hadoop-task_1_r_9.log, mkdirs.patch
>
> (attached: a patch to fix the problem, and a logfile showing the problem
> occuring twice)
> There is a race condition in Configuration.java:
> Path file = new Path(dirs[index], path);
> Path dir = file.getParent();
> if (fs.exists(dir) || fs.mkdirs(dir)) {
> return file;
> If two threads simultaneously process this code with the same target
> directory, fs.exists() will return false, but from fs.mkdirs() only one of
> the two threads will return true. From the Java documentation:
> "returns: true if and only if the directory was created, along with all
> necessary parent directories; false otherwise"
> That is, if the first thread successfully creates the directory, the second
> will not, and therefore return false, even though the directory exists.
> This was really happening. We use four temporary directories, and we had
> reducers failing all over the place with bizarre impossible errors. I
> modified the ReduceTaskRunner to output the filename that it creates to find
> the problem, and the log output is below.
> Here you can see copies initiated for two files that hash to the same temp
> directory, simultaneously. map_4.out is created in the correct directory
> (/data2...), but map_15.out is created in the next directory (/data3...)
> becuase of this race condition. Minutes later, when the appender tries to
> locate the file, that race condition does not occur (the directory already
> exists), and the appender looks for the file map_15.out in the correct
> directory, where it does not exist.
> 060605 142414 task_0001_r_000009_1 Copying task_0001_m_000004_0 output from
> rmr05.
> 060605 142414 task_0001_r_000009_1 Copying task_0001_m_000015_0 output from
> rmr04.
> ...
> 060605 142416 task_0001_r_000009_1 done copying task_0001_m_000004_0 output
> from rmr05 into /data2/tmp/mapred/local/task_0001_r_000009_1/map_4.out
> ...
> 060605 142418 task_0001_r_000009_1 done copying task_0001_m_000015_0 output
> from rmr04 into /data3/tmp/mapred/local/task_0001_r_000009_1/map_15.out
> ...
> 060605 142531 task_0001_r_000009_1 0.31808624% reduce > append >
> /data2/tmp/mapred/local/task_0001_r_000009_1/map_4.out
> ...
> 060605 142725 task_0001_r_000009_1 java.io.FileNotFoundException:
> /data2/tmp/mapred/local/task_0001_r_000009_1/map_15.out
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira