[
https://issues.apache.org/jira/browse/HADOOP-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated HADOOP-1513:
--------------------------------
Status: Open (was: Patch Available)
Ok, I realized that all what I said in my last comment will hold only for an
mkdir( ) call, but we are making mkdirs( ) call (which internally makes a chain
of mkdir( ) calls for each component in the path). mkdirs( ) will return false
if any mkdir( ) call returns false. So here is a case where breaking up the
expression evaluated within the 'if' statement will not solve the problem.
{noformat}
dir.mkdirs();
if (!dir.exists()) {
throw new DiskErrorException("can not create directory: "
+ dir.toString());
}
{noformat}
Two threads/processes (t1 & t2) go inside the mkdirs( ) call and t1 makes the
first few (successful) calls to mkdir( ), and then t2 gets to run. t2 will
immediately return error since the first component in the path already exists.
Now t2 goes to the exists( ) call and that might return false since the entire
directory tree might have not yet been created by t1. Thus, exception is thrown
and that is not right.
We have to make the above exists( ) check for each component in the path if
mkdir( ) for that component fails.
So we could have a custom implementation of mkdirs( ) called mkdirsExists( )
that will return false if the following expression returns false.
{noformat}
boolean mkdirsExists(String path) {
...........
if (!component.mkdir( ) && !component.exists( ) ) {
return false;
}
..........
}
{noformat}
Makes sense ?
> A likely race condition between the creation of a directory and checking for
> its existence in the DiskChecker class
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1513
> URL: https://issues.apache.org/jira/browse/HADOOP-1513
> Project: Hadoop
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.14.0
> Reporter: Devaraj Das
> Assignee: Devaraj Das
> Priority: Critical
> Fix For: 0.14.0
>
> Attachments: 1513.patch
>
>
> Got this exception in a job run. It looks like the problem is a race
> condition between the creation of a directory and checking for its existence.
> Specifically, the line:
> if (!dir.exists() && !dir.mkdirs()), doesn't seem safe when invoked by
> multiple processes at the same time.
> 2007-06-21 07:55:33,583 INFO org.apache.hadoop.mapred.MapTask:
> numReduceTasks: 1
> 2007-06-21 07:55:33,818 WARN org.apache.hadoop.fs.AllocatorPerContext:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: can not create
> directory: /export/crawlspace/kryptonite/ddas/dfs/data/tmp
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:26)
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:211)
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:248)
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:276)
> at
> org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:155)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:1171)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.(DFSClient.java:1136)
> at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:342)
> at
> org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.create(DistributedFileSystem.java:145)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.(ChecksumFileSystem.java:368)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:254)
> at org.apache.hadoop.io.SequenceFile$Writer.(SequenceFile.java:675)
> at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:165)
> at
> org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:137)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:189)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1740)
> 2007-06-21 07:55:33,821 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.