[ 
http://issues.apache.org/jira/browse/HADOOP-277?page=comments#action_12414999 ] 

Naveen Nalam commented on HADOOP-277:
-------------------------------------

Below is what mkdirs looks like according to the jad decompiler (File.class 
from is from Sun JDK 1.5).

It looks like to me if two processes/threads are trying to create "/a/b/c/d/e/" 
and nothing yet exists, they both try to create "/a". One will fail, while the 
other succeeds. The failing process will return failure early, while the other 
process continues to create "b/c/d/e/". If the failing process after returning 
from mkdirs() now calls exists("/a/b/c/d/e"), exists() could  return false 
because the other process is still creating the directories along the path.

So probably Sameer's suggestion of traversing in getLocalPath() is the best 
solution.

public boolean mkdirs()
    {
        if(exists())
            return false;
        if(mkdir())
            return true;
        File file = null;
        try
        {
            file = getCanonicalFile();
        }
        catch(IOException ioexception)
        {
            return false;
        }
        String s = file.getParent();
        return s != null && (new File(s, fs.prefixLength(s))).mkdirs() && 
file.mkdir();
    }


> Race condition in Configuration.getLocalPath()
> ----------------------------------------------
>
>          Key: HADOOP-277
>          URL: http://issues.apache.org/jira/browse/HADOOP-277
>      Project: Hadoop
>         Type: Bug

>  Environment: linux, 64 bit, dual core, 4x400GB disk, 4GB RAM
>     Reporter: paul sutter
>  Attachments: hadoop-277.patch, hadoop-task_1_r_9.log, mkdirs.patch
>
> (attached: a patch to fix the problem, and a logfile showing the problem 
> occuring twice)
> There is a race condition in Configuration.java:
>        Path file = new Path(dirs[index], path);
>        Path dir = file.getParent();
>        if (fs.exists(dir) || fs.mkdirs(dir)) {
>          return file;
> If two threads simultaneously process this code with the same target 
> directory, fs.exists() will return false, but from fs.mkdirs() only one of 
> the two threads will return true. From the Java documentation:
>  "returns: true if and only if the directory was created, along with all 
> necessary parent directories; false otherwise"
> That is, if the first thread successfully creates the directory, the second 
> will not, and therefore return false, even though the directory exists.
> This was really happening. We use four temporary directories, and we had 
> reducers failing all over the place with  bizarre impossible errors. I 
> modified the ReduceTaskRunner to output the filename that it creates to find 
> the problem, and the log output is below.
> Here you can see copies initiated for two files that hash to the same temp 
> directory, simultaneously. map_4.out is created in the correct directory 
> (/data2...), but map_15.out is created in the next directory (/data3...) 
> becuase of this race condition. Minutes later, when the appender tries to 
> locate the file, that race condition does not occur (the directory already 
> exists), and the appender looks for the file map_15.out in the correct 
> directory, where it does not exist.
> 060605 142414 task_0001_r_000009_1 Copying task_0001_m_000004_0 output from 
> rmr05.
> 060605 142414 task_0001_r_000009_1 Copying task_0001_m_000015_0 output from 
> rmr04.
> ...
> 060605 142416 task_0001_r_000009_1 done copying task_0001_m_000004_0 output 
> from rmr05 into /data2/tmp/mapred/local/task_0001_r_000009_1/map_4.out
> ...
> 060605 142418 task_0001_r_000009_1 done copying task_0001_m_000015_0 output 
> from rmr04 into /data3/tmp/mapred/local/task_0001_r_000009_1/map_15.out
> ...
> 060605 142531 task_0001_r_000009_1 0.31808624% reduce > append > 
> /data2/tmp/mapred/local/task_0001_r_000009_1/map_4.out
> ...
> 060605 142725 task_0001_r_000009_1 java.io.FileNotFoundException: 
> /data2/tmp/mapred/local/task_0001_r_000009_1/map_15.out

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to