[ 
https://issues.apache.org/jira/browse/HADOOP-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500317
 ] 

dhruba borthakur commented on HADOOP-1396:
------------------------------------------

The DFSClient uses a random number generator to generate the name of the 
temporary file where the latest block of the file-being-written-to is cached. 
The above problem could theoretically occur if two instances of DFSClient gets 
the same value from the random number generator at around the same time.

I am suspecting that "enabling speculative execution" somehow results in more 
number of concurrent tasks on the same node and this increase the probability 
of same tmp file being used concurrently by multiple tasks. Hence we see this 
problem more often when speculative-execution is switched on.

An alternative is to use File.createTempFile. This method will fail if the file 
already exists, otherwise it will be created atomically.


> FileNotFound exception on DFS block
> -----------------------------------
>
>                 Key: HADOOP-1396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1396
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Devaraj Das
>             Fix For: 0.14.0
>
>
> Got a couple of exceptions of the form illustrated below. This was for a 
> randomwriter run (and every node in the cluster has multiple disks).
> java.io.FileNotFoundException: /tmp/dfs/data/tmp/client-8395631522349067878 
> (No such file or directory)
>       at java.io.FileInputStream.open(Native Method)
>       at java.io.FileInputStream.(FileInputStream.java:106)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1323)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:1274)
>       at 
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.write(DFSClient.java:1256)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$FSOutputSummer.write(ChecksumFileSystem.java:402)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:38)
>       at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:775)
>       at 
> org.apache.hadoop.examples.RandomWriter$Map.map(RandomWriter.java:158)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:187)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1709)
> So it seems like the bug reported in HADOOP-758 still exists.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to