[ 
https://issues.apache.org/jira/browse/HDFS-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008532#comment-13008532
 ] 

Lars Ailo Bongo commented on HDFS-1768:
---------------------------------------

In case my previous comment was unclear. I believe the following caused the 
error:
1. I did a copyFromLocalFile that crashed after creating the checksum file, but 
before deleting the file
2. The content of stats-test.txt was changed such that the new checksum does 
not match the checksum in the old non-deleted checksum file.
3. Subsequent copyFromLocalFile uses the old checksum file

Something related happens if the checksum file is invalid, as in:

/home/larsab/troilkatt2/test-tmp/data>cat > .status-test.txt.crc
dsds
dsdsdsd
/home/larsab/troilkatt2/test-tmp/data>hadoop fs -put status-test.txt foo7
11/03/18 18:28:00 WARN fs.FSInputChecker: Problem opening checksum file: 
status-test.txt.  Ignoring exception: java.io.IOException: Not a checksum file: 
.status-test.txt.crc
        at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:137)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:222)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:170)
        at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1283)
        at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:134)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1817)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1960)


> fs -put crash that depends on source file name
> ----------------------------------------------
>
>                 Key: HDFS-1768
>                 URL: https://issues.apache.org/jira/browse/HDFS-1768
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.20.2
>         Environment: Cloudera CDH3B4 in pseudo mode on a Linux 
> 2.6.32-28-generic #55-Ubuntu SMP x86_64 kernel, with Java HotSpot64-Bit 
> Server VM (build 19.1-b02, mixed mode)
>            Reporter: Lars Ailo Bongo
>            Priority: Minor
>
> I have a unit test that includes writing a file to HDFS using 
> copyFromLocalFile. Sometimes the function fails due to a checksum error. Once 
> the issue has occurred "hadoop -put <filename> <anywhere>" also fails as long 
> as the filename is the same as used in the unit test. The error is due to the 
> file content never being sent to the DataNode, hence the file is size zero. 
> The error is not due to the file content. The error does not depend on the 
> HDFS destination name. Restarting the NameNode and DataNode does not resolve 
> the issue. I have not been able to reproduce the error with a simple program. 
> I have also not tested the issue in distributed or standalone mode.
> The only "fix" is to change the source filename.
> Below is error and the NameNode log. There is no entry for this operation in 
> the DataNode log.
> /home/larsab/troilkatt2/test-tmp/data>hadoop fs -put status-test.txt 
> status-test.txt3
> 11/03/18 16:59:54 INFO fs.FSInputChecker: Found checksum error: b[512, 
> 968]=3a646f6e650a323a7365636f6e6453746167653a73746172740a323a7365636f6e6453746167653a646f6e650a323a746869726453746167653a73746172740a323a746869726453746167653a646f6e650a323a74686553696e6b3a73746172740a323a74686553696e6b3a646f6e650a323a54726f696c6b6174743a646f6e650a333a54726f696c6b6174743a73746172740a333a746865536f757263653a73746172740a333a746865536f757263653a646f6e650a333a666972737453746167653a73746172740a333a666972737453746167653a646f6e650a333a7365636f6e6453746167653a73746172740a333a7365636f6e6453746167653a646f6e650a333a746869726453746167653a73746172740a333a746869726453746167653a646f6e650a333a74686553696e6b3a73746172740a333a74686553696e6b3a646f6e650a333a54726f696c6b6174743a646f6e650a343a54726f696c6b6174743a73746172740a343a746865536f757263653a73746172740a343a746865536f757263653a646f6e650a343a666972737453746167653a73746172740a343a666972737453746167653a646f6e650a343a7365636f6e6453746167653a7265636f7665720a
> org.apache.hadoop.fs.ChecksumException: Checksum error: status-test.txt at 512
>       at 
> org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
>       at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
>       at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
>       at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
>       at java.io.DataInputStream.read(DataInputStream.java:83)
>       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:49)
>       at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:87)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:224)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:170)
>       at 
> org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1283)
>       at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:134)
>       at org.apache.hadoop.fs.FsShell.run(FsShell.java:1817)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1960)
> put: Checksum error: status-test.txt at 512
> NAMENODE
> 2011-03-18 16:59:54,422 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 
> 13 Total time for transactions(ms): 1Number of transactions batched in Syncs: 
> 0 Number of syncs: 7 SyncTimes(ms): 220 
> 2011-03-18 16:59:54,444 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=larsab    
> ip=/127.0.0.1   cmd=create      src=/user/larsab/status-test.txt3       
> dst=null        perm=larsab:supergroup:rw-r--r--
> 2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file /user/larsab/status-test.txt3 from client DFSClient_-1004170418
> 2011-03-18 16:59:54,469 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file /user/larsab/status-test.txt3 is closed by 
> DFSClient_-1004170418

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to